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Abstract 

This  thesis  studies  the  problems  associated  with  adaptive  signal  processing  in  the 
sample  dehcient  regime  using  random  matrix  theory.  The  scenarios  in  which  the 
sample  deficient  regime  arises  include,  among  others,  the  cases  where  the  number  of 
observations  available  in  a  period  over  which  the  channel  can  be  approximated  as  time- 
invariant  is  limited  (wireless  communications),  the  number  of  available  observations  is 
limited  by  the  measurement  process  (medical  applications),  or  the  number  of  unknown 
coefficients  is  large  compared  to  the  number  of  observations  (modern  sonar  and  radar 
systems).  Random  matrix  theory,  which  studies  how  different  encodings  of  eigenvalues 
and  eigenvectors  of  a  random  matrix  behave,  provides  suitable  tools  for  analyzing  how 
the  statistics  estimated  from  a  limited  data  set  behave  with  respect  to  their  ensemble 
counterparts. 

The  applications  of  adaptive  signal  processing  considered  in  the  thesis  are  (1) 
adaptive  beamforming  for  spatial  spectrum  estimation,  (2)  tracking  of  time-varying 
channels  and  (3)  equalization  of  time-varying  communication  channels.  The  thesis 
analyzes  the  performance  of  the  considered  adaptive  processors  when  operating  in 
the  deficient  sample  support  regime.  In  addition,  it  gains  insights  into  behavior 
of  different  estimators  based  on  the  estimated  second  order  statistics  of  the  data 
originating  from  time- varying  environment.  Finally,  it  studies  how  to  optimize  the 
adaptive  processors  and  algorithms  so  as  to  account  for  deficient  sample  support  and 
improve  the  performance. 

In  particular,  random  matrix  quantities  needed  for  the  analysis  are  characterized 
in  the  hrst  part.  In  the  second  part,  the  thesis  studies  the  problem  of  regularization 
in  the  form  of  diagonal  loading  for  two  conventionally  used  spatial  power  spectrum 
estimators  based  on  adaptive  beamforming,  and  shows  the  asymptotic  properties  of 
the  estimators,  studies  how  the  optimal  diagonal  loading  behaves  and  compares  the 
estimators  on  the  grounds  of  performance  and  sensitivity  to  optimal  diagonal  load- 
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ing.  In  the  third  part,  the  performance  of  the  least  squares  based  channel  tracking 
algorithm  is  analyzed,  and  several  practical  insights  are  obtained.  Finally,  the  per¬ 
formance  of  multi-channel  decision  feedback  equalizers  in  time-varying  channels  is 
characterized,  and  insights  concerning  the  optimal  selection  of  the  number  of  sensors, 
their  separation  and  constituent  hlter  lengths  are  presented. 

Thesis  Supervisor:  Dr.  James  C.  Preisig 
Title:  Associate  Scientist  with  Tenure,  WHOI 
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Chapter  1 


Introduction 


This  thesis  analyzes  the  performance  of  some  types  of  adaptive  signal  processing 
algorithms  when  the  number  of  observations  available  to  adapt  the  characteristics  of 
the  algorithms  is  small.  Adaptive  processing  algorithms,  as  considered  in  this  thesis, 
are  those  whose  goal  is  to  track  unknown  parameters  in  real-time  but  which  do  not 
know  a  priori  the  statistics  of  those  parameters  or  the  observations.  A  general  block 
diagram  of  an  adaptive  processor  is  shown  in  Fig.  1-1.  The  input  data  are  processed 
such  that  the  output  is  in  some  predehned  way  close  to  the  reference  (desired)  signal. 
The  coefficients,  also  called  weights  in  a  linear  processor,  are  evaluated  and  updated 
based  on  the  input  signal,  difference  between  the  obtained  and  desired  outputs  and 
optimization  criterion  [i.e.,  objective  or  cost  function)  [27].  The  format  of  the  input, 
structure  of  the  processor,  objective  function  and  unknown  parameters  depend  on 
a  specihc  application.  Three  applications  of  adaptive  processing  are  considered  in 
this  thesis.  These  are  the  estimation  of  the  spatial  spectrum  from  observations  at 
an  array  of  sensors,  the  tracking  of  time-varying  channels,  and  the  equalization  of 
communications  channels. 
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desired  output 


Figure  1-1:  Block  diagram  of  adaptive  processor. 

1.1  Adaptation  with  Second  Order  Statistics 

1.1.1  Objective  Function  based  on  Second  Order  Statistics 

The  objective  functions  corresponding  to  the  applications  of  adaptive  processing  con¬ 
sidered  in  this  thesis  are  such  that  the  processor  coefficients  obtained  as  the  solutions 
to  the  corresponding  optimization  problems  depend  on  second  order  statistics  of  the 
input  data.  In  other  words,  the  unifying  feature  of  the  problems  studied  here  is  that 
the  adaptive  processing  relies  on  the  second  order  statistics  of  the  data. 

Although  processing  which  utilizes  higher  order  statistics  is  an  option  that  has 
been  extensively  studied  [34],  this  thesis  focuses  on  processing  with  second  order 
statistics  for  at  least  three  reasons. 

First,  the  second  order  statistics  arise  naturally  in  problems  associated  with  Gaus¬ 
sian  processes  (completely  characterized  by  their  first  and  second  order  statistics)  as 
well  as  problems  utilizing  Minimum  Mean  Square  Error  (MMSE)  and  Least  Square 
(LS)  error  criteria  with  linear  signal  and  processing  models. 

Second,  the  ensemble  statistics  of  the  data  are  unknown  in  practice  and  are  esti¬ 
mated  from  the  observed  data.  As  will  be  argued  shortly  and  studied  more  extensively 
in  the  thesis,  the  number  of  stationary  observations  is  usually  not  sufficient  to  accu¬ 
rately  estimate  even  the  second  order  statistics.  Estimating  higher  order  statistics  in 
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this  case  becomes  even  more  prohibitive. 

Third,  adaptive  processing  with  higher  order  statistics  requires  more  computa¬ 
tions,  which  is  a  limiting  factor  in  many  practical  applications.  Greater  computa¬ 
tional  capability  requires,  in  general,  a  corresponding  increase  in  power  consumption 
which  is  often  constrained  in  processors  in  underwater  acoustic  applications  [10]. 

1.1.2  Ensemble  Correlation  and  Sample  Correlation  Matrix 

We  emphasize  that  the  objective  functions  corresponding  to  different  adaptive  pro¬ 
cessing  applications  relying  on  second  order  statistics  are  in  general  different.  How¬ 
ever,  the  solution  for  the  processor  weights  in  all  applications  depends  on  the  second 
order  statistics  of  the  input  data. 

The  input  (also  called  received  or  observed)  data  at  a  particular  time  is  a  collection 
of  measurements  (he.,  samples)  of  the  received  signal.  These  measurements  can  be 
taken  in  spatial,  delay,  or  both  spatial  and  delay  domains,  and  are  arranged  into  an 
input  (also  called  observation)  vector  u.  In  general,  u  G  where  C  is  the  set  of 

complex  numbers  and  m  is  the  dimension  of  the  observation  space.  ^ 

The  second  order  statistics  of  the  input  data  are  captured  via  correlations  be¬ 
tween  measurements  that  constitute  the  observation  vector  u.  These  correlations  are 
formatted  into  an  input  correlation  matrix,  dehned  as^ 

R  =  E  [uu^]  .  (1.1) 

The  expectation  in  the  above  dehnition  is  taken  over  the  ensemble  of  observation 
vectors.  Note  that  R  G 

The  ensemble  statistics  of  the  input  signal,  and  consequently  the  input  correlation 
matrix  R,  is  usually  unknown  and  has  to  be  estimated  from  the  observed  data. 
Assuming  the  input  process  is  ergodic,  the  ensemble  statistics  are  estimated  via  time 

^Note  that  the  number  of  degrees  of  freedom  is  often  smaller  than  the  dimension  of  the  obser¬ 
vation  space  m.  The  focus  of  this  thesis  is  not  on  developing  and  addressing  the  problems  of  lower 
dimensional  representations. 

^This  is  also  the  covariance  matrix  if  the  input  process  has  zero  mean.  Without  loss  of  generality, 
we  assume  throughout  the  thesis  that  all  the  input  processes  have  zero  mean. 
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averaging.  A  widely  used  estimator  for  the  input  ensemble  correlation  matrix  is  the 
sample  correlation  matrix  (SCM).  The  SCM  is  dehned  as 

1  ” 

R  =  —  ^  u(A;)u'^(A;),  (1.2) 

^  k=l 

where  u{k)  is  the  observation  vector  received  at  discrete  time  k,  while  n  is  the  length 
of  the  observation  window. 

It  can  be  observed  that  the  SCM  is  an  unbiased  estimator  of  the  input  correlation 
matrix.  Also,  the  SCM  is  the  maximum  likelihood  (ML)  estimator  of  the  ensemble 
correlation  matrix  when  the  snapshots  are  Gaussian  distributed  [52].  More  impor¬ 
tantly,  for  a  fixed  and  hnite  number  of  coefficients  m,  as  the  number  of  observations 
n  — )■  oo,  [13] 

||R  —  R||  — )■  0,  a.s.  (1.3) 

where  ||||  is  a  spectral  norm  of  a  matrix. 

A  practical  interpretation  of  the  above  result  is  that  the  SCM  is  an  accurate 
estimate  of  the  input  correlation  matrix  when  the  available  number  of  observations 
n  used  to  compute  the  SCM  is  sufficiently  large.  The  literature  usually  cites  the 
empirical  result  that  n  should  be  3  times  larger  than  m  when  the  input  process  has 
few  dominant  eigenvalues  [43].  Some  remarks  on  this  result  are  made  towards  the 
end  of  the  following  section. 

1.2  Deficient  Sample  Support 

As  pointed  out  in  the  previous  section,  if  the  number  of  observations  n  is  sufficiently 
many  times  larger  than  the  number  of  coefficients  m,  the  SCM  is  an  accurate  estimate 
of  the  input  correlation  matrix.  However,  this  is  rarely  the  case  in  the  applications 
considered  in  this  thesis  and  especially  when  operating  in  the  underwater  acoustic 
environment. 

The  problem  of  insufficient  number  of  observations  might  arise  as  a  result  of  one 
or  more  of  the  following  reasons.  First,  the  statistics  of  the  input  signal  might  be 
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non-stationary  because  the  signal  has  propagated  through  a  time-varying  environ¬ 
ment.  A  typical  example  is  the  wireless  communication  channel.  Effectively,  this 
means  that  the  time  interval  over  which  the  input  signal  can  be  assumed  stationary 
is  hnite  and  possibly  short  [49].  Since  the  adaptation  is  performed  using  the  statistics 
estimated  from  stationary  observations,  the  number  of  observation  vectors  might  not 
be  sufficient  to  accurately  estimate  the  correlation  matrix. 

Second,  the  length  of  the  observation  interval  that  can  be  used  to  estimate  the 
time- varying  statistics  might  not  be  sufficient.  This  typically  arises  in  medical  ap¬ 
plications  where  only  a  limited  number  of  measurements  are  taken  in  a  diagnostic 
test. 

Finally,  the  number  of  dimensions  might  be  very  large  such  that  the  number  of 
observation  vectors  is  small  compared  to  the  number  of  dimensions.  The  examples 
include  a  sonar  system  which  nowadays  might  have  hundreds  to  thousands  of  sensors 
or  a  sensor  array  in  a  modern  seismic  imaging  system  which  might  contain  several 
thousands  of  sensors  [46].  The  number  of  observation  vectors  in  such  a  scenario  is 
often  smaller  than  the  number  of  sensors. 

An  important  parameter  we  often  refer  to  in  the  thesis  is  the  ratio  between  the 
number  of  dimensions  m  and  observation  vectors  n, 

(1- 

Note  that  1/c  is  the  average  number  of  observations  per  matrix  dimension. 

We  say  that  an  adaptive  processor  operates  in  a  dehcient  sample  support  (also 
called  observation  dehcient)  regime  when  the  number  of  observations  n  is  smaller  or 
not  many  times  larger  than  the  number  of  dimensions  m.  In  terms  of  parameter  c, 
the  observation  dehcient  regime  arises  when  c  >  1  or  c  is  not  much  smaller  than  1. 
Although  not  formal,  the  notion  of  dehcient  sample  support  is  important  and  helps 
in  gauging  the  discussion  in  this  thesis. 

So  far,  we  have  pointed  out  that  dehcient  sample  support  arises  quite  often  in 
practice,  especially  in  the  applications  studied  in  this  thesis.  Here,  we  qualitatively 
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study  how  deficient  sample  support  impacts  the  estimation  accuracy  of  the  SCM. 
In  doing  so,  the  eigenvalues  of  a  matrix  are  chosen  to  conveniently  visualize  and 
intuitively  infer  if  and  how  much  the  SCM  departs  from  the  corresponding  ensemble 
correlation  matrix.  Two  examples  are  considered  as  an  illustration. 

In  the  first  example,  we  assume  that  n  observation  vectors  of  a  zero  mean,  unit 
power  white  noise  process  are  received  on  m  sensors.  The  ensemble  correlation  matrix 
R  is  the  identity  matrix  of  order  m  and  therefore  it  has  m  eigenvalues  equal  to  one. 
To  simulate  how  the  eigenvalues  of  the  SCM  corresponding  to  white  noise  process 
behave,  we  perform  the  following  numerical  experiment  [15].  A  number  of  realizations 
of  SCM’s  corresponding  to  white  noise  process  are  generated.  Each  SCM  has  order  m 
and  is  evalnated  from  n  different  observation  vectors  of  white  noise  process  using  (1.2). 
The  eigenvalues  of  each  SCM  are  computed,  all  obtained  eigenvalues  are  collected  and 
the  normalized  histogram,  of  area  1,  is  evaluated. 

The  plots  of  normalized  histograms  for  m  =  10  sensors  and  n  =  20  and  n  =  60 
observation  vectors  are  respectively  shown  in  the  top  and  bottom  part  of  Fig.  1-2. 
As  can  be  observed,  the  eigenvalnes  of  the  SCM  are  spread  around  the  ensemble 
(i.e,  true)  eigenvalue  1  (whose  multiplicity  is  10).  This  indicates  that  the  SCM  and 
ensemble  correlation  matrix  differ.  The  amount  of  spread  gives  an  intnitive  indication 
of  how  much  the  SCM  departs  from  the  ensemble  correlation  matrix. 

Finally,  note  that  as  the  number  of  observations  per  dimension  1/c  increases  from 
2  (top  hgure)  to  6  (bottom  hgure),  the  eigenvalnes  of  the  SCM  concentrate  aronnd 
the  ensemble  eigenvalue. 

In  the  second  example,  the  input  process  is  of  zero  mean  and  its  correlation 
matrix  is  such  that  it  has  three  distinct  eigenvalues:  2,  5  and  7.  The  process  is 
measnred  on  m  =  3  sensors  and  n  =  15  stationary  observation  vectors  are  available 
for  computing  the  SCM.  We  perform  the  same  numerical  experiment  as  in  previous 
example.  Namely,  a  number  of  SCM’s,  each  of  order  m  =  3  and  computed  from  n  =  15 
different  observation  vectors,  are  generated.  The  eigenvalues  of  each  realization  are 
computed  and  the  normalized  histogram  of  all  eigenvalnes  is  evalnated.  The  histogram 
is  shown  in  Fig.  1-3. 
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2  observations  per  dimension 


6  observations  per  dimension 


Figure  1-2:  Normalized  histograms  of  the  eigenvalues  of  sample  correlation  matrices 
corresponding  to  zero  mean,  unit  power,  white  noise  process  measured  on  m  =  10 
sensors.  The  number  of  observations  n  is  20  in  the  top  plot  and  60  in  the  bottom 
plot.  The  ensemble  eigenvalue  is  1. 
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Figure  1-3:  Normalized  histogram  of  the  eigenvalues  of  sample  correlation  matrices 
corresponding  to  zero  mean,  colored  process  whose  ensemble  eigenvalues  are:  2,  5 
and  7.  The  process  is  measured  on  m  =  3  sensors  and  n  =  15  observation  vectors  are 
received. 


The  histogram  plot  in  1-3  shows  that  the  eigenvalues  of  the  SCM  are  spread 
around  the  ensemble  {i.e.,  true)  eigenvalues  of  the  input  process.  More  specihcally, 
inferring  the  ensemble  eigenvalues  from  the  normalized  histogram  itself  would  be  a 
daunting  task.  This  indicates  that  the  SCM  and  ensemble  correlation  matrix  differ. 
More  importantly,  this  happens  even  though  the  number  of  observation  vectors  is  5 
times  larger  than  the  number  of  sensors  {i.e.,  the  number  of  dimensions)  ! 


As  a  hnal  remark,  we  point  out  that  in  addition  to  an  empirical  result  from  [43] 
stating  that  3  observations  per  dimension  are  sufficient  to  accurately  estimate  the 
SCM,  the  separation  between  distinct  ensemble  eigenvalues  also  plays  its  role.  Namely, 
as  shown  in  the  considered  example,  3  observations  per  dimension  are  not  sufficient  in 
order  to  have  the  sample  eigenvalues  start  falling  in  non-overlapping  segments  around 
their  ensemble  counterparts.  However,  this  number  might  be  sufficient  if  the  ensemble 
eigenvalues  were  well  separated. 
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1.3  Applications  of  Adaptive  Processing 


This  section  briefly  overviews  the  three  applications  of  adaptive  processing  that  are 
studied  in  the  thesis.  Their  unifying  feature  is  that  a  problem  of  deficient  sample 
support  often  arises  when  those  applications  operate  in  practical  settings.  This  is  in 
particular  the  case  in  underwater  acoustic  environment,  which  is  our  primary  interest. 

The  notion  of  deficient  sample  support  is  highlighted  in  the  overview  of  each 
application.  The  illustration  is  provided  in  the  last  part  with  results  obtained  from 
processing  the  underwater  acoustic  data  collected  in  a  held  experiment. 

1.3.1  Adaptive  Spatial  Spectrum  Estimation 

Spatial  power  spectrum  estimation  consists  of  estimating  the  power  received  at  an 
array  of  sensors  as  a  function  of  direction  of  arrival.^  This  is  usually  used  in  the 
context  of  estimating  the  number  of  point  source  signals  embedded  in  the  signal  that 
is  received  at  an  array  and  then  estimating  their  direction  of  arrival  and  power.  Due 
to  time-varying  nature  of  the  environment,  these  quantities  are  changing  in  time  and 
an  adaptive  beamformer  tracks  the  spatial  spectrum  in  real  time. 

A  block  diagram  of  an  adaptive  beamformer  is  shown  in  Fig.  1-4.  An  array 
of  sensors  is  spatially  sampling  the  received  signal.  The  Fourier  transform  of  the 
signal  received  on  each  sensor  is  computed.  The  Fourier  coefficients  across  the  array 
corresponding  to  a  particular  frequency  bin  of  interest  is  the  observation  vector  for 
an  adaptive  processor.  These  observation  vectors  are  often  called  snapshots. 

The  snapshots  are  processed  through  an  adaptive  processor  and  its  output  is  an 
estimate  of  the  spatial  spectrum  in  a  particular  direction  and  frequency  bin  of  interest. 
The  adaptive  processor  is  a  linear,  time-varying  spatial  filter  which  contains  a  single 
tap  per  each  sensor.  The  coefficients  of  the  processor  are  computed  and  adapted  based 
on  the  estimated  statistics  of  the  received  signal  [52].  The  number  of  coefficients  being 
adapted  is  equal  to  the  number  of  sensors  and  is  denoted  by  m. 

The  number  of  stationary  snapshots  n,  used  to  compute  the  SCM  and  in  turn 
^Note  that  the  spatial  spectrum  can  be  mapped  into  the  wavenumber  spectrum. 
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Figure  1-4:  Adaptive  spatial  spectrum  estimation. 


the  processor  coefficients,  is  finite  and  often  limited  due  to  time-varying  nature  of  the 
environment.  Therefore,  the  adaptive  processor  often  operates  in  the  sample  deficient 
regime. 


1.3.2  Time- Varying  Channel  Tracking 

A  block  diagram  of  a  channel  tracking  problem  framed  as  an  adaptive  processing 
problem  is  shown  in  Fig.  1-5.  The  unknown  channel  is  modeled  as  a  linear,  time- 
varying  finite  impulse  response  (FIR)  filter.  Observations  of  the  signal  that  has 
passed  through  the  channel  are  contaminated  with  observation  noise.  The  adaptive 
processor  has  access  to  channel  inputs  and  noisy  channel  outputs.  The  estimated 
channel  impulse  response  (also  called  channel  vector)  is  updated  at  discrete  time  t 
based  on  the  estimated  channel  impulse  response  at  time  t  —  1  and  channel  input  and 
observed  noisy  output  at  time  t  [27]  [41]. 

Following  the  terminology  introduced  in  Section  1.1,  the  observation  vector  at 
discrete  time  t  is  formed  from  the  input  data  samples  which  impact  the  channel 
output  at  time  t.  The  number  of  these  samples  is  equal  to  the  number  of  unknown 
channel  coefficients  and  is  denoted  with  m.  Note  that  this  is  the  dimensionality  of 
the  observation  space. 
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Figure  1-5:  Channel  tracking. 


The  number  of  observation  vectors  n,  viewed  in  the  context  of  previous  section,  is 
tied  with  how  rapidly  the  channel  varies  in  time.  More  specihcally,  even  though  the 
channel  input  might  originate  from  a  stationary  process,  the  adaptive  processor  might 
still  operate  in  the  sample  dehcient  regime.  Namely,  the  channel  impulse  response 
at  a  particular  time  instant  is  estimated  from  the  inputs  and  noisy  outputs  observed 
during  the  time  interval  over  which  the  channel  is  approximately  time-invariant  such 
that  the  output  data  is  approximately  stationary. 

1.3.3  Communications  Channel  Equalization 

The  wireless  communications  channels  through  which  signals  are  often  transmitted 
are  often  time-varying  and  characterized  by  multipath  propagation,  which  then  re¬ 
sults  in  intersymbol  interference  and  Doppler  spreading  of  the  signal.  Most  techniques 
developed  for  mitigating  these  effects  rely  in  part  or  completely  on  channel  equaliza¬ 
tion. 

A  block  diagram  of  an  equalizer  is  shown  in  Fig.  1-6.  The  equalizer  input  are 
samples  of  the  received  signal.  The  equalizer  is  a  processor  which  coherently  com¬ 
bines  the  received  signal  energy  and  consequently  compensates  for  the  intersymbol 
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interference  introduced  in  the  channel.  A  decision  device  estimates  the  transmitted 
symbols  from  the  equalizer  outputs  [41], 

The  equalizer  coefficients  depend  on  channel  impulse  response  and  are  therefore 
adapted  according  to  channel  variations.  The  block  diagram  in  Fig.  1-6  corresponds 
to  an  equalizer  with  direct  adaptation.  Depending  on  the  position  of  the  switch,  an 
equalizer  operates  in  either  training  or  decision  directed  mode.  The  coefficients  of 
an  equalizer  operating  in  the  decision  directed  mode  are,  at  discrete  time  t,  updated 
based  on  the  received  signal  and  the  difference  between  the  detected  symbol  and  its 
soft  estimate  at  discrete  time  t  —  1.  On  the  other  hand,  the  transmitted  symbols  are 
known  a  priori  when  an  equalizer  operates  in  a  training  mode  such  that  the  coefficients 
are  updated  based  on  the  received  signal  and  the  difference  between  the  transmitted 
symbol  and  its  soft  estimate. 

The  number  of  equalizer  coefficients  m  is  the  number  of  dimensions.  The  estimate 
of  input  correlation  matrix  is  essential  in  the  computation  and  adaptation  of  equalizer 
coefficients  when  the  objective  function  is  based  on  second  order  statistics. 

In  the  context  of  Section  1.1,  the  observation  vector  at  discrete  time  t  is  a  vec¬ 
tor  of  appropriately  arranged  input  samples  of  the  received  signal  which  impact  the 
detection  of  a  transmitted  symbol  at  time  t.  The  number  of  stationary  observation 
vectors  n,  used  in  the  computation  of  the  SCM,  depends  on  how  rapidly  the  trans¬ 
mission  channel  varies  in  time.  The  relative  ratio  between  the  number  of  equalizer 
coefficients  m  and  number  of  observation  vectors  n  might  be  such  that  the  adaptation 
is  performed  with  dehcient  sample  support. 

1.3.4  Underwater  Acoustic  Environment 

The  coefficients  of  adaptive  processors  described  in  the  previous  section  are  quite  often 
adapted  with  dehcient  sample  support.  The  causes  might  be  time-varying  channel, 
non-stationary  environment,  large  number  of  sensors  or  their  combinations.  This  is 
especially  the  case  in  the  underwater  acoustic  setting  where  the  underwater  acoustic 
signals  are  adaptively  processed.  This  in  fact  is  our  main  motivation  for  studying 
adaptive  processing  with  dehcient  sample  support.  The  particular  applications  are 
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Figure  1-6;  Channel  equalizer  with  direct  adaptation. 


underwater  acoustic  communications  and  passive  sonar. 

The  main  challenges  associated  with  adaptive  processing  of  underwater  acoustic 
signals  arise  from  the  time  variability  of  the  underwater  acoustic  environment  [2],  [51]. 
The  time  variability  of  the  environment  is  caused  by  unavoidable  drifts  of  the  receiver, 
motions  of  the  sources,  surface  and  internal  waves,  and  other  moving  objects  such 
as  fish.  These  result  in  finite  and  often  short  coherence  intervals  during  which  the 
observations  are  statistically  stationary.  The  time  variability  of  the  environment  is 
usually  quantified  by  the  scattering  function  [58]. 

In  addition  to  causing  deficient  sample  support,  an  underwater  acoustic  envi¬ 
ronment  poses  a  number  of  other  challenges  on  shallow  and  deep  water  underwater 
acoustic  system  design  [51],  [2].  As  such,  the  motions  in  the  underwater  acoustic  envi¬ 
ronment  together  with  a  relatively  small  speed  of  propagation  (nominally  1500  m/s) 
result  in  relatively  large  Doppler  spread  of  the  received  signals.  Further,  the  acoustic 
waves  exhibit  multiple  bounces  off  the  surface  and  bottom  in  shallow  waters,  which 
results  in  long  delay  spread  of  the  received  signal.  The  delay  spread  of  the  commu¬ 
nications  signals  may  extend  over  several  tens  to  hundreds  of  transmitted  symbols. 
Also,  the  attenuation  of  the  underwater  sound  is  frequency  dependent.  Finally,  the 
ambient  noise  is  non-Gaussian  and  correlated  in  space  and  time. 

To  illustrate  the  time-variability  of  underwater  acoustic  environment  and  chal¬ 
lenges  associated  with  processing  the  underwater  acoustic  signals,  some  results  ob¬ 
tained  from  processing  the  acoustic  signals  measured  in  a  field  experiment  are  pre- 
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sented.  The  field  experiment,  labeled  KAMll,  took  place  in  the  shallow  water  off  the 
coast  of  Hawaii  in  summer  2011  with  one  goal  being  to  study  the  underwater  acoustic 
communication  channel  [28]. 

The  impulse  response  of  time- varying  underwater  acoustic  channel  estimated  from 
the  data  collected  in  the  field  experiment  is  shown  in  Fig.  1-7.  The  horizontal  and 
vertical  axes  represent  the  time  and  delay,  respectively.  Therefore,  a  vertical  slice  in 
Fig.  1-7  is  the  channel  impulse  response  at  the  corresponding  time  instant.  As  can  be 
observed,  the  channel  impulse  response  exhibits  time-variations  even  on  short  time 
scales. 

The  acoustic  signals  recorded  in  the  field  experiment  are  received  on  a  linear, 
vertical,  uniformly  spaced  24-sensor  array.  The  distributions  of  the  received  acoustic 
energy  over  the  space  of  elevation  angles  and  delays  at  two  different  time  instants  are 
shown  in  Fig.  1-8.  The  elevation  angle  is  defined  with  respect  to  the  array  such  that 
90"  corresponds  to  the  broadside  of  the  array  and  0°  corresponds  to  signals  traveling 
up  from  below.  The  time  difference  between  the  plots  in  the  top  and  bottom  part  is  45 
seconds.  The  arrival  structure  of  the  received  signal  fluctuates  in  time  and  while  the 
number  of  arrivals  and  their  directions  do  not  vary  significantly  in  the  time  interval 
of  45  seconds,  the  amount  of  energy  associated  with  different  arrivals  changes  in  this 
time  period. 


1.4  Thesis  Objectives 

This  thesis  studies  the  problem  of  adaptive  processing  in  the  deficient  sample  support 
regime.  The  applications  of  adaptive  processing  considered  are  adaptive  beamforming 
for  spatial  spectrum  estimation,  tracking  of  time-varying  channels  and  equalization 
of  communication  channels.  The  computation  and  adaptation  of  coefficients  in  the 
considered  adaptive  processors  are  based  on  the  estimates  of  second  order  statistics 
of  the  data.  The  unifying  feature  of  the  considered  applications  is  that  the  number 
of  observations  is  quite  often  insufficient  to  accurately  estimate  the  second  order 
statistics.  This  is  especially  the  case  in  the  underwater  acoustic  environment,  which  is 
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Figure  1-7:  Impulse  response  of  an  underwater  acoustic  communication  channel  esti¬ 
mated  from  the  KAMll  held  data.  The  color  scale  is  in  dB. 
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Figure  1-8:  Distribution  of  received  acoustic  energy  versus  elevation  angle  and  delay 
at  the  initial  time  instant  (top  plot)  and  after  45  seconds  (bottom  plot).  The  data 
used  to  generate  these  plots  was  collected  in  KAMll  held  experiment.  The  color 
scale  is  in  dB. 


our  main  motivation.  However,  the  issues  associated  with  adaptation  in  the  dehcient 
sample  support  regime  often  arise  in  many  other  practical  applications  and  settings. 

The  main  tool  used  for  studying  the  adaptation  with  dehcient  sample  support  is 
random  matrix  theory.  The  random  matrix  theory  characterizes  the  eigenvalues  and 
eigenvectors  of  different  classes  of  random  matrices.  A  sample  correlation  matrix, 
which  estimates  the  correlation  structure  of  the  input  process,  is  a  random  matrix 
of  our  interest.  Consequently,  the  thesis  can  be  viewed  as  an  application  of  random 
matrix  theory  methods  for  addressing  the  problems  of  adaptive  processing. 

In  short,  the  thesis  analyzes  the  performance  of  the  considered  adaptive  processors 
when  operating  in  the  dehcient  sample  support  regime.  In  addition,  it  gains  insights 
into  behavior  of  diherent  estimators  based  on  the  estimated  second  order  statistics 
of  the  data  originating  from  time- varying  environment.  Finally,  it  studies  how  to 
optimize  the  adaptive  processors  and  algorithms  so  as  to  account  for  dehcient  sample 
support  and  consequently  improve  the  performance. 


1.5  Organization  of  the  Thesis 

The  thesis  is  organized  as  follows. 

Chapter  2  presents  background  on  random  matrix  theory  methods  and  eval¬ 
uates  important  quantities  needed  for  the  performance  analysis  in  later  chapters. 
More  specihcally,  the  eigenvalue  density  functions,  eigenvalue  and  eigenvector  Stieltjes 
transforms  and  moments  of  a  random  matrix  are  dehned.  The  random  matrix  models 
which  describe  sample  correlation  matrices  used  in  the  thesis  are  presented  and  impor¬ 
tant  theorems  which  characterize  Stieltjes  transforms  corresponding  to  these  models 
are  stated.  Using  these  characterizations,  the  moments  of  the  considered  sample  cor¬ 
relation  matrices  are  evaluated.  In  addition,  two  important  results  which  characterize 
the  expectation  and  variance  of  functionals  of  Gaussian  matrices  are  stated.  Finally, 
the  chapter  is  concluded  with  the  discussion  on  how  the  asymptotic  random  matrix 
theory  results  are  used  in  practical,  non-asymptotic,  scenarios. 
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Chapter  3  considers  a  problem  of  spatial  power  spectrum  estimation  with  an  ar¬ 
ray  of  sensors  in  the  dehcient  sample  support  regime.  More  specihcally,  the  problem 
of  regularization  ^  in  the  form  of  diagonal  loading  of  a  sample  correlation  matrix,  used 
in  two  spatial  power  spectrum  estimators,  is  studied.  In  particular,  the  asymptotic 
behavior  of  two  spatial  power  estimators,  their  expectations,  variances  and  MSB’s 
are  analyzed  in  the  limit  when  the  number  of  snapshots  and  number  of  sensors  grow 
large  at  the  same  rate.  Due  to  rapid  convergence,  the  limiting  values  accurately  ap¬ 
proximate  the  corresponding  quantities  for  hnite  number  of  sensors  and  snapshots. 
Further,  the  study  of  dependence  of  the  bias  and  variance  corresponding  to  power 
estimators  on  diagonal  loading  leads  to  a  conjecture  that  the  variance  has  negligible 
impact  on  the  value  of  optimal  diagonal  loading  which  minimizes  the  MSB.  The  be¬ 
havior  of  optimal  diagonal  loading  when  the  arrival  process  is  composed  of  plane  waves 
embedded  into  uncorrelated  noise  is  investigated.  Finally,  the  MSB  and  sensitivity 
performances  of  the  optimized  power  estimators  are  compared. 

Chapter  4  presents  a  performance  study  of  the  RLS  algorithm  when  it  is  used 
to  track  a  channel  which  varies  according  to  a  first  order  Markov  process.  The  ex¬ 
pressions  for  signal  prediction  and  channel  estimation  mean  square  errors  (MSB)  are 
derived  and  validated  via  simulations.  The  general  results  are  applied  for  specihc 
scenarios  and  as  special  cases  the  behavior  in  the  steady-state,  performance  of  LS- 
based  identihcation  of  linear  time-invariant  channel  and  performance  of  the  sliding 
window  RLS  algorithm  are  considered.  Finally,  several  practical  results  such  as  those 
characterizing  the  optimal  exponential  forgetting  factor  in  the  exponentially  weighted 
RLS  or  optimal  averaging  window  length  in  the  sliding  window  RLS  algorithm,  are 
obtained. 

Chapter  5  presents  a  performance  study  of  the  least  squares  based  multi-channel 
Decision  Feedback  Bqualizer  when  the  transmission  channel  is  non- stationary  and 
modeled  as  a  frequency  selective  hlter  which  is  time-invariant  over  only  short  time  in¬ 
tervals.  The  expression  for  signal  prediction  MSB  is  derived  and  validated  via  Monte- 
Carlo  simulations.  Further,  it  is  elaborated  that  the  optimal  number  of  equalizer  coef- 

^Also  called,  Tikhonov  regularization  and  relaxation. 
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ficients  is  a  trade  off  between  two  competing  requirements  such  that  an  equalizer  with 
relatively  short  constituent  hlters  can  outperform  one  using  longer  hlters.  Finally,  the 
impact  of  the  number  of  sensors  and  separation  between  them  on  the  equalization 
performance  of  a  time-varying  underwater  acoustic  communication  channel  is  stud¬ 
ied.  The  insights  concerning  the  optimal  selection  of  the  number  of  and  separation 
between  sensors  as  well  as  the  lengths  of  the  constituent  hlter  are  validated  using  the 
data  collected  in  a  held  experiment. 

Chapter  6  summaries  the  context  and  adaptive  processing  problems  addressed 
in  this  thesis,  highlights  the  contributions  and  suggests  possible  future  work. 
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Chapter  2 


Random  Matrix  Theory  Methods 

2.1  Introduction 

Random  matrix  theory  is  a  mathematical  area  concerned  with  the  characterization  of 
random  matrices.  It  is  usually  classified  into  small  and  large  dimensional  (asymptotic) 
random  matrix  theory. 

The  small  dimensional  theory  studies  random  matrices  of  finite  dimension.  The 
small  dimensional  theory  includes  the  very  hrst  random  matrix  theory  result  leading 
to  the  joint  probability  density  function  of  the  entries  of  the  Wishart  matrix,  which  is 
effectively  the  SCM  of  the  Gaussian  observation  process  [61].  A  brief  overview  of  the 
relevant  results  in  small  dimensional  theory  is  given  in  [13].  A  compact  survey  of  the 
relevant  results  from  the  numerical  analysis  perspective,  along  with  a  broader  per¬ 
spective  of  the  subject  and  its  intimate  connections  with  the  orthogonal  polynomials 
is  given  in  [16]. 

The  large  dimensional  theory  studies  how  different  transforms  of  eigenvalues  and 
eigenvectors  behave  asymptotically  when  the  order  of  an  underlying  random  matrix 
grows.  The  birth  of  large  dimensional  theory  is  usually  attributed  to  the  Wigner 
semi-circle  law  [59],  [60]  and  Marcenko-Pastur  law  [33].  Since  then,  the  asymptotic 
behavior  of  random  matrices  has  been  extensively  studied.  Nice  overviews  of  the 
results  relevant  for  the  applications  is  engineering  areas  are  given  in  [54]  and  Part  I 
of  [13].  The  large  dimensional  random  matrix  theory  is  of  interest  in  this  thesis  and 
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we  simply  refer  to  it  as  random  matrix  theory. 

The  random  matrix  theory  has  been  successfully  applied  in  information  theory  and 
wireless  communications  since  the  hrst  result  reported  in  [53] .  An  overview  of  random 
matrix  theory  applications  for  studying  a  variety  of  communication  techniques  from 
the  information  theory  perspective  is  given  in  Part  II  of  [13].  The  random  matrix 
theory  has  also  been  applied  in  detection  and  estimation  such  as  for  developing  new 
algorithms  for  detecting  and  estimating  the  number  of  sources  [42],  [30]. 

This  thesis  exploits  the  random  matrix  theory  insights  and  results  in  the  study 
of  adaptive  processing  problems  associated  with  adaptation  in  the  dehcient  sample 
support  regime.  More  specihcally,  we  study  the  problems  of  time-varying  channel 
estimation,  diagonal  loading  for  spatial  spectrum  estimation  with  small  number  of 
snapshots  and  equalization  of  time-varying  wideband  communications  channels.  The 
random  matrix  theory  is  a  convenient  tool  used  in  the  analysis  of  these  problems. 
Different  performance  metrics  are  asymptotically  characterized  and  the  obtained  ex¬ 
pressions  are  used  to  approximate  the  cases  of  practical  interest. 

This  chapter  introduces  the  fundamental  concepts  in  random  matrix  theory,  presents 
the  results  relevant  for  our  analysis  and  evaluates  important  quantities  whose  char¬ 
acterization  is  necessary  for  the  study  of  adaptive  processing  problems  in  the  rest  of 
the  thesis. 

The  rest  of  the  chapter  is  organized  as  follows.  Section  2.2  defines  fundamental 
quantities  in  random  matrix  theory.  Section  2.3  defines  eigenvalue  and  eigenvec¬ 
tor  Stieltjes  transforms  and  presents  two  important  theorems  which  characterize  the 
asymptotic  behavior  of  these  transforms  for  the  random  matrix  models  of  our  interest. 
Section  2.4  dehnes  limiting  moments  of  a  random  matrix  and  elaborates  how  they 
can  be  computed  from  the  Stieltjes  transform.  The  applications  of  these  concepts 
and  results  to  sample  correlation  matrix  (SCM)  models  are  presented  in  Section  2.5. 
In  particular,  the  moments  of  an  exponentially  weighted  and  rectangulary  windowed 
SCM  are  evaluated  and  the  inverse  of  the  SCM  is  characterized.  Also,  important 
results  corresponding  to  the  SCM  of  a  white  noise  process  are  separately  presented. 
Section  2.6  presents  two  important  results  that  constitute  the  Gaussian  method.  This 
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chapter  is  concluded  with  Section  2.7  which  details  how  the  asymptotic  results  are 
used  and  interpreted  in  practical  applications. 


2.2  Limiting  Eigenvalue  Density  Function 

The  eigenvalues  Ai,  A2, . . . ,  Xm  of  an  m-by-m  random  and  Hermitian  matrix  can  be 
encoded  with  a  so-called  empirical  Eigenvalue  Distribution  Function  This 

function  is  defined  as  the  cumulative  distribution  function  of  the  discrete  uniform 
random  variable  which  can  take  values  equal  to  the  eigenvalues  of  A^-  That  is, 

m 

G'^™(a;)  =  -V/(_^,,](Afc),  (2.1) 

m  ^ ^ 

k=l 

where  I[a,b]{x)  =  1  for  a  <  x  <  b  and  is  zero  otherwise. 

The  derivative  of  G'^'^{x)  is  the  empirical  Eigenvalue  Density  Function 
given  by 

^  m 

= —^Sd{x  -  Xk),  (2.2) 

k=l 

where  Sd{x)  is  a  Dirac  delta  function. 

For  some  random  matrix  ensembles,  when  m  — )■  00,  the  empirical  Eigenvalue 
Density  Function  fiAmix)  converges  to  a  non-random  limiting  Eigenvalue  Density 
Function  ha{x)-  The  convergence  is  almost  sure  and  the  support  of  the  limiting 
function  is  compact. 

An  example  of  a  random  matrix  model  whose  empirical  eigenvalue  density  function 
converges  under  some  relatively  mild  conditions,  often  satisfied  in  practice,  is  a  sample 
correlation  matrix  (SCM)  corresponding  to  white  noise  process.  Such  a  limiting 
eigenvalue  density  function  is  given  in  a  closed  form  and  is  widely  known  as  Marcenko- 
Pastur  law  [33].  This  result  is  presented  in  Section  2.5.3. 

The  SCM  corresponding  to  white  noise  process  is  one  of  the  few  random  matrix 
ensembles  whose  limiting  Eigenvalue  Density  Function  can  be  expressed  in  closed 
form.  This  is  the  motivation  for  introducing  other  ways  of  encoding  the  eigenvalues 
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of  random  matrices. 


2.3  The  Stieltjes  Transform 

A  possible  way  to  encode  the  eigenvalnes  of  a  random  matrix  is  via  the  Stieltjes 
transform.  The  Stieltjes  transform  is  particularly  useful  in  the  theoretical  analysis 
of  adaptive  processing  with  dehcient  sample  support.  This  section  dehnes  the  Stielt¬ 
jes  transform  and  presents  important  asymptotic  characterizations  used  later  in  the 
analysis. 


2.3.1  The  Eigenvalue  and  Eigenvector  Stieltjes  Transform 


In  general,  the  Stieltjes  transform  S{z)  encodes  a  real- valued  density  function  fi{x) 
such  that 


S{z) 


- u(x)dx, 

X  —  z 


(2.3) 


where  the  integration  is  implied  over  the  support  of  ^{x).  The  Stieltjes  transform  is 
formally  defined  for  >  0  (A{}  denotes  the  imaginary  part  of  a  complex  number), 
so  that  the  above  integral  does  not  have  singularities. 

The  Stieltjes  transform  is  intimately  related  to  the  Hilbert  and  Cauchy  transform. 
Namely,  the  principal  value  of  the  scaled  Stieltjes  transform  is  the  Hilbert  transform. 
Also,  the  negative  value  of  the  Sieltjes  transform  is  the  Cauchy  transform. 

Given  the  Stieltjes  transform  S{z),  a  density  function  fj,{x)  is  easily  recovered 
using  the  Stieltjes-Perron  inversion  formula  [13] 


fi{x) 


—  lim  AlAla;  -|-  ie)}. 

TTe^O 


(2.4) 


The  Stieltjes  Transform  for  Eigenvalues 

In  the  context  of  random  matrix  theory,  the  Stieltjes  transform  corresponding  to  the 
empirical  Eigenvalue  Density  Function  /iA™(a^)  is,  after  substituting  (2.2)  into  (2.3), 
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given  by 

m  ^  Ak  —  z 

k=l 

where  z  is  a  complex  number  with  positive  imaginary  part.  We  refer  to  SAm{z)  as 
the  empirical  Stieltjes  transform.  Recalling  that  the  trace  of  a  matrix  is  equal  to  the 
sum  of  its  eigenvalues,  SA^iz)  is  expressed  as 

=  — tr{(A^  -  (2.6) 

m 

For  some  random  matrix  ensembles,  when  m  oo,  the  empirical  Stieltjes  trans¬ 
form  SAm{z)  almost  surely  converges  to  a  non-random  limiting  Stieltjes  transform 
^a(2:).  a  sample  correlation  matrix  is  a  random  matrix  ensemble  whose  limiting 
Stieltjes  transform  exists  under  some  relatively  mild  conditions,  often  satished  in 
practice. 

The  limiting  Stieltjes  transform  Sa{z)  is  the  Stieltjes  transform  (2.3)  correspond¬ 
ing  to  the  limiting  Eigenvalue  Density  Function  ^a{x).  Therefore,  ^a{x)  can  be 
obtained  from  Sa{z)  using  the  Stieltjes- Perron  inversion  formula  (2.4).  This  is  in  fact 
the  essence  of  the  Stieltjes  transform  method  [13]. 

The  later  developments  exploit  a  simple  relation  between  the  Stieltjes  transforms 
of  two  matrices  related  via  linear  transformation.  Assuming  a  matrix  is  obtained 
from  Am  as 

Bm  =  oAm  +  &I,  (2-7) 

where  a  7^  0,  the  Stieltjes  transforms  S'b„,(2:)  and  SA^iz)  are  related  as 

Sb„W  =  1sa.  (— ).  (2.8) 

a  \  a  / 

This  result  straightforwardly  follows  from  (2.6).  In  addition,  if  SAm{z)  converges  as 
m  — )■  00,  the  relation  between  the  limiting  Stieltjes  transforms  remains  the  same. 

The  empirical  and  limiting  Stieltjes  transforms  encode  the  corresponding  densities 
of  the  eigenvalues.  Therefore,  we  refer  to  these  transforms  as  eigenvalue  Stieltjes 
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transforms.  The  Eigenvector  Stieltjes  Transform  is  introduced  in  the  following  section. 


The  Stieltjes  Transform  for  Eigenvectors 

Traditionally,  the  behavior  of  the  eigenvectors  of  a  random  matrix  is  studied  by 
analyzing  the  orthogonal  projections.  As  such,  it  is  known  that  the  eigenvectors 
of  the  finite  dimensional  SCM  corresponding  to  a  Gaussian  distributed,  zero  mean, 
unit  power,  white  noise  process  are  uniformly  distributed  on  the  unit  sphere.  In 
the  effort  to  generalize  this  result  without  constraining  the  process  be  Gaussian,  the 
projection  of  a  random  vector  onto  the  subspace  spanned  by  the  eigenvectors  of  the 
SGM  of  white  noise  process  is  studied  [13].  More  recently,  the  angle  between  the 
eigenvector  a  random  matrix  and  the  ensemble  eigenvector  of  the  underlying  process 
is  asymptotically  characterized  in  [6]. 

The  eigenvectors  of  a  random  matrix  are  also  studied  implicitly  via  quadratic 
forms.  This  approach  is  more  suitable  for  the  applications  considered  in  this  thesis 
and  is  introduced  here. 

Given  deterministic  vectors  Si  and  S2,  the  eigenvalues  and  eigenvectors  of  a  random 
matrix  Am  are  encoded  via  [21] 

FArr.  {Z)  =  Sf  (A^  -  Ziy^  S2,  (2.9) 

where  7^  0  so  that  Fa^z)  has  no  singularities.  We  refer  to  function  Fa^z)  as 
the  empirical  Eigenvector  Stieltjes  Transform,  although  it  depends  on  eigenvalues  as 
well.  Note  that  Fa^{z)  depends  on  the  vectors  Si  and  S2.  However,  we  omit  showing 
this  dependence  explicitly  in  order  to  keep  the  notation  uncluttered. 

For  some  random  matrix  ensembles,  when  m  — )■  00,  the  empirical  Eigenvector 
Stieltjes  Transform  Fa^{z)  almost  surely  converges  to  a  non-random  limiting  eigen¬ 
vector  Stieltjes  transform  Fa{z).  A  sample  correlation  matrix  is  a  random  matrix 
ensemble  whose  limiting  Stieltjes  transform  exists  under  some  relatively  mild  condi¬ 
tions,  often  satished  in  practice. 

The  empirical  eigenvector  Stieltjes  transforms  corresponding  to  matrices  and 
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Am  related  via  linear  transformation 


Bjtt,  —  ClAm  &I7 


(2.10) 


where  a  7^  0,  are  related  as 


(2.11) 


This  result  straightforwardly  follows  from  (2.9).  In  addition,  if  Fa^{z)  converges  as 
m  — )■  cx),  the  limiting  eigenvector  Stieltjes  transforms  are  related  in  the  same  way. 


2.3.2  The  Eigenvalue  Stieltjes  Transform  for  an  important 
model 

The  limiting  Eigenvalue  Stieltjes  Transform  of  an  important  class  of  random  matrices 
is  characterized  with  the  following  theorem  [14].  The  random  matrix  model  considered 
in  this  theorem  is  essential  for  our  study  because  it  has  a  similar  form  to  the  SCM 
model  used  in  the  analysis. 

Theorem  2.1.  Let  m  and  n  be  positive  integers  and  let 

A„,  =  -R^XTX^R^  (2.12) 

n 

he  an  m-by-m  matrix  with  the  following  hypothesis 

1.  X  is  an  m-hy-n  matrix  with  i.i.d.  complex  entries  of  zero  mean  and  unit  vari¬ 
ance. 

2.  R5  is  an  m-by-m  Hermitian  positive  semi-definite  sguare  root  of  the  positive 
semi-definite  Hermitian  matrix  R, 

3.  T  =  diag^Ti, . . . ,  r^)  with  Tj  >  0  for  all  positive  integer-values  i, 

4-  the  seguences  {G'^"(a;)}^^  and  {G^’"(a;)}“^^  are  tight,  i.e.,  for  all  e  >  0,  there 
exists  M  >  0  such  that  G^{M)  >  1  —  e  and  G^{M)  >  1  —  e  for  all  n,  m. 
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5.  there  exist  b  >  a  >  0  for  which  a  <  lim  infm^.oo  c  <  lim  c  <  b  with 

c  =  m/n. 

Then,  as  m  and  n  grow  large  with  ratio  c,  the  empirical  Eigenvalue  Stieltjes  Transform 
of  Am,  almost  surely  converges  to  the  limiting  Eigenvalue  Stieltjes  Transform, 

Sa{z),  where 

=  (2.13) 

m  \J  1  +  CTe{z)  J 

and  the  function  e(z)  is  the  unique  solution  of  the  equation 

e{z)  =  -tr  |r  (  /  1  (2.14) 

m  \J  ^  +  CTe(z)  J  J 

such  that  the  sign  of  the  imaginary  part  of  e{z)  and  z  coincide  if  7^  0  and 
e(z)  >  0  if  z  is  real  and  negative. 

Note  that  the  limiting  Stieltjes  transform  Sa{z)  is  expressed  in  terms  of  the  func¬ 
tion  e{z),  which  is  given  as  the  unique  solution  to  the  hxed  point  equation  (2.14).  The 
hxed  point  equation  can  be  solved  numerically,  for  example,  by  hxing  and  solving 
for  e{z)  via  classical  hxed-point  iterations.  The  accompanying  theorem  in  [14]  shows 
that  the  iterative  algorithm  converges  if  e{z)  is  appropriately  initialized.  In  particu¬ 
lar,  the  theorem  proves  the  convergence  of  the  iterative  method  if  e{z)  is  initialized 
with  e®(;2)  =  -1/z. 

A  more  general  version  of  the  theorem  is  stated  and  proved  in  [14].  The  random 
matrix  model  considered  therein  is  given  as  the  sum  of  a  Hermitian  positive  semi- 
dehnite  matrix  and  sum  of  hnite  number  of  models  of  the  same  type  as  (2.12)  with 
different  and  independent  constituent  matrices  satisfying  the  same  conditions  as  given 
in  Theorem  2.1. 

As  a  preview,  R  is  the  ensemble  correlation  matrix  of  the  input  to  the  adaptive 
processor.  Also,  the  diagonal  elements  of  T  constitute  windowing  applied  to  the 
observation  vectors.  The  examples  of  the  ensemble  correlation  matrix  corresponding 
to  the  input  process  consisting  of  a  single  or  multiple  plane  waves  impinging  on  an 
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array  of  sensors  are  (3.113)  and  (3.117).  Note  that  the  inverse  of  the  square  root  of 
the  correlation  matrix,  R“^,  is  a  whitening  hlter. 

In  the  following,  a  couple  of  remarks  are  made  regarding  the  conditions  of  the 
Theorem  2.1.  First,  the  same  characterization  of  the  limiting  Stieltjes  transform  holds 
if  the  entries  in  X  are  independent  and  have  hnite  moments  of  order  2  +  e  for  some 
e  >  0.  Note  that  this  alleviates  the  necessity  for  having  identically  distributed  entries 
in  X.  The  existence  of  moments  of  order  higher  than  2  implies  that  distributions  of 
all  entries  in  X  have  light  tails^  [13]. 

Condition  3  requires  that  the  matrix  T  be  diagonal.  The  limiting  Stieltjes  trans¬ 
form  for  a  more  general  version  of  the  model  which  does  not  impose  such  a  restriction 
on  the  matrix  T  is  characterized  in  [65]. 

The  sequences  in  condition  4  of  Theorem  2.1  are  the  empirical  eigenvalue  distri¬ 
bution  functions  corresponding  to  matrices  T  and  R,  and  are  by  dehnition  upper 
bounded  by  1.  Essentially,  this  means  that  with  high  increasingly  high  probability 
the  eigenvalues  of  the  matrices  T  and  R  do  not  blow  out  in  the  limit  as  m  ^  oo. 

As  a  hnal  remark,  note  that  in  addition  to  taking  the  limit  m  — )■  cx),  which  is 
in  accordance  with  the  definition  of  limiting  Stieltjes  transform,  the  number  n  also 
grows  large.  Condition  5  implies  that  the  ratio  between  m  and  n  is  hnite  and  non-zero 
number,  which  further  implies  that  m  and  n  simultaneously  grow  and  scale  linearly. 

2.3.3  The  Eigenvector  Stieltjes  Transform  for  an  important 
model 

The  limiting  Eigenvector  Stieltjes  Transform  of  an  important  class  of  random  matrices 
is  characterized  with  the  following  theorem  [21].  As  in  the  previous  part,  the  random 
matrix  model  considered  in  this  theorem  has  a  similar  form  to  the  SCM  model  used 
in  the  analysis. 


^More  specifically,  it  implies  the  Lindeberg-like  condition  [7] 
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Theorem  2.2.  Let  m  and  n  he  positive  integers  and  let 


=  Sf  (A„  -  ^I)  'S2 


(2.15) 


be  the  empirieal  Eigenvector  Stieltjes  Transform  corresponding  to  the  m-hy-m  matrix 

A 

■^m 

1 

A„,  =  -R^XX^Ri  (2.16) 

n 

Assume  the  following  hypothesis  hold 

1.  Si  and  S2  are  m-by-1  deterministic  vectors  with  uniformly  bounded  norms  for 
all  m, 

2.  X  is  an  m-by-n  matrix  with  i.i.d.  complex  entries  of  zero  mean,  unit  variance 
and  finite  eight-order  moment, 

3.  R5  is  an  m-hy-m  Hermitian  positive  semi-definite  sguare  root  of  the  positive 
semi-definite  Hermitian  matrix  R, 

f.  R  has  uniformly  hounded  spectral  norm  for  all  m,  i.e.,  sup^  ||R||  <  oo. 

Then,  as  m  and  n  grow  large  at  the  same  rate  such  that  ^  c,  where  c  G  (0,  cxd), 

|F(^)-i?(2)|^0,  (2.17) 


almost  surely  for  all  z  with  >  0.  The  limiting  Eigenvector  Stieltjes  Transform 

is  given  by 

^  H  H 

A*  (l  -  c  -  czSa{z))  -  z' 

where  Sa{z)  is  the  limiting  Eigenvalue  Stieltjes  Transform  corresponding  to  matrix 
Am  and  \k  and  are  the  eigenvalues  and  eigenvectors  of  the  matrix  R. 

Note  from  the  limiting  result  (2.18)  that  in  general  the  eigenvectors  are  coupled 
with  all  eigenvalues  via  the  Eigenvalue  Stieltjes  Transform  ^a(2:).  However,  if  z  ^ 
0“,  each  term  in  the  sum  (2.18)  depends  on  one  eigenvector  and  its  corresponding 


n 
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eigenvalue.  This  decoupling  effect  is  the  implication  of  large  m,  n  limit  in  which  all 
little  perturbations,  caused  by  a  large  number  of  random  variables  which  compose 
the  Eigenvector  Stieltjes  Transform,  get  combined  in  such  a  way  that  the  variance  of 
the  quadratic  form  decreases. 

Note  that  model  (2.16)  is  a  special  case  of  (2.12)  with  T  =  I.  In  fact,  the  limiting 
Eigenvalue  Stieltjes  Transform  of  (2.16),  Sa{z),  can  be  evaluated  using  Theorem  2.1 
with  T  =  I. 

We  point  out  that  the  same  remark  made  with  regard  to  condition  1  of  Theorem  2.1 
also  holds  here.  Finally,  note  the  equivalence  between  conditions  4  of  Theorems  2.1 
and  2.2  corresponding  to  matrix  R. 


2.4  Moments  of  Random  Matrices 

The  moments  of  a  random  matrix  are  quantities  of  our  interest.  This  section  dehnes 
the  notion  of  a  moment  and  elaborates  how  the  moments  can  be  computed  from  the 
Stieltjes  transform. 

The  k-th  empirical  moment  of  a  non-singular  and  Hermitian  random  matrix 
Am  is  dehned  as  a  normalized  trace  of  the  k-th  power  of  its  inverse,  namely 

A4  =  -tr{Ay}.  (2.19) 

m 

Note  that  empirical  moment  can  be  expressed  in  terms  of  the  empirical  eigen¬ 
value  density  function  fJ,Am{x)  as 

Mfc  =  j  x~^fiAk{x)dx.  (2.20) 

If  the  limiting  Eigenvalue  Density  Function  of  a  random  matrix  ensemble  of  in¬ 
terest  exists,  the  empirical  moment  converges  as  m  ^  cxd  to  a  non-random  quantity 
called  limiting  moment  M^.  The  limiting  moment  is  given  as  an  expectation  corre- 
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spending  to  the  limiting  Eigenvalue  Density  Function 


X  ^jix{x)dx,  (2-21) 

where  the  integration  is  implied  over  the  support  of  /ta- 

The  comparison  between  (2.6)  and  (2.19)  reveals  a  relatively  simple  relation  be¬ 
tween  the  Eigenvalue  Stieltjes  Transform  and  hrst  moment  Mi.  Although  the  em¬ 
pirical  Stieltjes  transform  is  formally  dehned  for  non-real  (in  order  to  avoid  sin¬ 
gularities),  its  support  is  extended  to  real  2:  which  do  not  fall  inside  the  support  of 
the  underlying  eigenvalue  density  function.  Assuming  the  sequence  of  matrices 
is  positive  dehnite,  is  dehned  for  such  that  Im{;2}  =  0  and  Re{;2}  — )■  0“, 

which  we  compactly  represent  as  z  ^  0“.  Hence,  using  (2.6)  the  empirical  moment 
Ml  is  given  by 

Ml  =  lim  Aa™(;^)  (2.22) 

z—>-0~ 

Similarly,  the  limiting  hrst  moment  Mi  is  related  to  the  limiting  eigenvalue  Stielt¬ 
jes  transform  S{z)  (if  it  exists)  via 

Ml  =  lim  Aa(2:)  (2.23) 

2^0“ 


Mfc  =  lim  — tr{A^^}  =  [ 
m^oo  Tfl  J 


The  higher  order  empirical  and  limiting  moments  can  be  obtained  similarly  from 
the  empirical  and  limiting  Stieltjes  transforms.  The  /c-th  derivative  with  respect  to 
of  the  empirical  Eigenvalue  Stieltjes  Transform  (2.6)  is  given  by 

=  (2.24) 

oz'^  m 


Assuming  the  sequence  of  matrices  A^  is  positive  dehnite  and  order  k  is  hnite, 
the  /c-th  derivative  of  Sji^^{z)  is  dehned  as  — )■  0“.  Thus,  the  empirical  k-ih  moment 
Mk  is  obtained  from  the  /c-th  derivative  of  the  empirical  eigenvalue  Stieltjes  transform 
-5a,„(^)  as 


Mu 


1 

k\  2  dz>^ 


(2.25) 
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Finally,  the  limiting  k-i\i  moment  Mk  is  given  from  the  k-th  derivative  of  the 
limiting  Eigenvalue  Stieltjes  Transform  by 


Mk 


1  d^SAiz) 

k\  i-T  dz^ 


(2.26) 


This  method  is  used  to  evaluate  the  limiting  moments  needed  for  the  performance 
analysis  of  adaptive  processors  operating  in  the  deficient  sample  support  regime.  As  a 
hnal  note,  in  a  more  general  framework,  other  important  quantities  in  random  matrix 
theory  are  characterized  using  the  Stieltjes  transform  and  similar  methods  [22].  The 
obtained  characterizations  are  commonly  known  as  Girko  estimators. 


2.5  Sample  Correlation  Matrix  Models 

The  models  of  the  sample  correlation  matrices  used  in  this  thesis  are  introduced  in 
this  section.  Also,  the  limiting  Stieltjes  transforms  corresponding  to  these  models  are 
characterized  and  the  limiting  moments  are  evaluated.  The  limiting  eigenvalue  density 
functions  of  two  SCM  models  corresponding  to  white  noise  process  are  presented  in 
the  last  part. 

2.5.1  The  Moments  of  the  Exponentially  Weighted  SCM 

The  exponentially  weighted  SCM  computed  at  discrete  time  n  from  observation  vec¬ 
tors  u(/c)  of  dimension  m,  received  at  discrete  times  /c  =  1,  2, . . . ,  n,  is  dehned  as 

n 

R(n)  =  A'^"^u(A;)u^(A;),  (2.27) 

k=l 

where  A  G  (0, 1)  is  an  exponential  forgetting  factor.  The  exponential  forgetting  factor 
A  attenuates  past  observations  not  relevant  for  the  estimate  of  correlation  matrix  at 
current  time  instant.  The  forgetting  factor  A  is  usually  very  close  to  1. 

Note  that  the  effective  number  of  observations  is  controlled  by  the  value  of  for¬ 
getting  factor  A.  That  is,  if  A  is  small,  only  few  most  recent  observations  impact  the 
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SCM  (2.27).  A  common  rule  of  thumb  is  that  the  effective  number  of  observations 
Ueffis  1/(1 -A)  [27]. 

The  effective  number  of  observations  is  a  relevant  quantity  in  the  study  of  adaptive 
processing  with  dehcient  sample  support  when  the  exponential  weighting  of  observa¬ 
tion  vectors  is  employed. 

Random  Matrix  Model  for  the  SCM 

If  the  ensemble  correlation  matrix  of  the  arrival  process  is  R,  the  observation  vector 
u(/c)  is  modeled  as  a  colored  process 

u(A;)  =  R2x(/c),  (2.28) 

where  R^  is  a  positive-semi  dehnite  square  root  of  the  correlation  matrix  R  and 
x(A;)  G  C™'  is  a  vector  of  i.i.d.  zero  mean,  unit  variance  entries. 

From  (2.28)  and  (2.27),  the  exponentially  weighted  SCM  can  be  compactly  rep¬ 
resented  by 

R(n)  =  R^XA(n)X^R5,  (2.29) 

where 

1.  X  is  m-by-n  matrix  whose  k-ih  column,  x(/c),  is  an  observation  vector  of  a  zero 
mean,  unit  power  white  noise  and 

2.  A(n)  =  diag  (A"“^,  A”“^, . . . ,  1)  is  a  diagonal  matrix  of  order  n  composed  from 
the  powers  of  forgetting  factor  A. 

The  Limiting  First  Moment  of  the  Exponentially  Weighted  SCM 

In  the  following,  the  limiting  hrst  moment  Mi  corresponding  to  the  exponentially 
weighted  SCM  is  evaluated.  As  elaborated  in  Section  2.4,  the  hrst  step  is  to  charac¬ 
terize  the  limiting  Eigenvalue  Stieltjes  Transform.  This  is  done  by  using  the  result  of 
Theorem  2.1.  Then,  the  limiting  hrst  moment  is  obtained  from  the  limiting  Stieltjes 
transform  by  taking  the  limit  z  ^  0~. 
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The  exponentially  weighted  SCM  (2.29)  is  equivalent  to  the  matrix  (2.12)  up 
to  scaling  when  T  =  I.  Namely, 

Am  =  — R(n).  (2.30) 

n 

Their  limiting  Eigenvalue  Stieltjes  Transforms  are  using  (2.8)  related  as 

Sa{z)  =  nS^{nz),  (2.31) 

where  Sf^{z)  is  the  limiting  eigenvalue  Stieltjes  transform  corresponding  to  the  expo¬ 
nentially  weighted  SCM  R(n). 

The  empirical  Eigenvalue  Density  Function  of  A(n)  is  given  by 

n 

=  (2.32) 

^  k=l 


The  substitution  of  (2.30)  and  (2.32)  into  (2.13)  yields  the  characterization  of  the 
limiting  Eigenvalue  Stieltjes  Transform  of  SCM  R 


nSt,(nz)  =  Tr  (  1  ^ 


A 


n—k 


-1 


m  yn  ^  1  -|-  cX^~’^e{z) 


R-zl 


where  e{z)  is  obtained  by  substituting  (2.32)  into  (2.14)  as 


yn-fc 


1  -|-  cX^  ^e{z) 


R 


(2.33) 


(2.34) 


Taking  the  limit  z  ^  0  (he.,  =  0  and  ^  0  )  in  (2.34)  and  denoting 

e(0)  =  lim^^o-  yields 


1 

i(0) 


1 

n 


n 


E 


yn-fc 


1  -h  cA^-^e(O) 
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(2.35) 


Similarly,  taking  the  limit  z  ^  0  in  (2.33)  yields 


nMi 


1 

m 


yn— fc  \ 

1  +  cA^-^e(O)  j 


tr{R-^}. 


(2.36) 


We  observe  from  (2.35)  and  (2.36)  that  nMi  =  ^tr{  R~^}e(0)  and  solve  for  e(0) 
in  terms  of  Mi.  After  substituting  the  result  in  (2.35)  and  rearranging  the  obtained 
expression,  the  limiting  hrst  moment  is  dually  given  as  a  fixed  point  solution  to 


1 

Wi 


Y _ ^ ^ 

^tr{R“i}  + 


(2.37) 


Recall  that  the  hxed-point  iterative  method,  if  appropriately  initialized,  converges 
to  the  unique  solution  of  the  fixed  point  equation  (2.14).  Since  the  fixed  point  equa¬ 
tion  (2.35)  is  obtained  from  (2.14)  by  taking  the  limit  2:  — )■  0“,  we  conclude  that  (2.35) 
has  unique  solution  which  can  be  found  by  employing  the  iterative  algorithm.  Con¬ 
sequently,  the  limiting  moment  Mi  is  given  as  the  unique  solution  to  the  fixed-point 
equation  (2.37). 

Given  that  the  iterative  algorithm  initialized  with  e^^\z)  =  —1/z  converges  to  the 
unique  solution  of  (2.14)  [14],  the  iterative  procedures  for  solving  (2.35)  and  (2.37) 
should  be  initialized  with  a  large  negative  number.  However,  since  the  solutions 
to  (2.35)  and  (2.37)  are  positive  reals,  an  initialization  with  a  small  positive  number 
suffices.  The  simulations  performed  to  validate  the  results  in  later  chapters  conhrm 
this  observation. 


Note  that  in  a  special  case  when  A  =  1,  the  limiting  moment  Mi  is,  using  (2.37), 
given  in  a  closed  form 


Ml  = 


1 


m{n  — 


— tr{R 
m) 


(2.38) 
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2.5.2  The  Moments  and  Inverse  of  the  Rectangularly  Win¬ 


dowed  SCM 


The  SCM  evaluated  from  n  rectangularly  windowed  observation  vectors  of  dimension 
m  has  been  introduced  in  Section  1.1.2.  For  completeness  we  repeat  the  dehnition 
here.  Namely,  given  the  stationary  observation  vectors  u{k)  at  discrete  times  k  = 
1,2, ...  ,n,  the  rectangularly  windowed  SCM  is  given  by 

1  ” 

R  =  —  ^  u(/c)u'^(/c).  (2.39) 

^  k=l 


As  already  elaborated,  if  the  ensemble  correlation  matrix  of  the  arrival  pro¬ 
cess  is  R,  the  observation  vector  u(/c)  is  modeled  using  (2.28).  Substituting  (2.28) 
into  (2.39),  the  rectangularly  weighted  SCM  is  compactly  represented  as 

R  =  -R^XX^RT  (2.40) 

n 

where  X  is  m-by-n  matrix  of  i.i.d.  zero  mean,  unit  variance  entries.  Its  columns 
model  observations  of  a  zero  mean,  nnit  power  white  noise  process. 

The  SCM  in  (2.39)  is  usually  diagonally  loaded  in  order  to  improve  the  condition 
nnmber  of  the  resulting  matrix  and  for  some  additional  reasons  that  will  become  clear 
in  Chapter  3.  The  diagonally  loaded  SCM  is  given  by 

R5  =  R  +  M,  (2.41) 

where  <5  >  0  is  a  diagonal  loading  parameter. 

The  limiting  eigenvalue  Stieltjes  transforms  of  R  and  R^  are  characterized  in  the 
following  part. 
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The  Limiting  Eigenvalue  Stieltjes  Transform  of  the  SCM 

The  limiting  eigenvalue  Stieltjes  transforms  corresponding  to  the  unloaded  SCM  R 
and  diagonally  loaded  SCM  are  from  (2.8)  related  as 


SaM)  =  St,{z  -  5). 


(2.42) 


The  limiting  Stieltjes  transform  *S^(;2)  is  found  using  Theorem  2.1  by  setting  T  =  I 
in  (2.12).  Since  =  5d{j  —  1),  the  Stieltjes  transform  S-^{z)  is  using  (2.13) 

expressed  as 


S'p(2:)  =  — tr 

^  m 


t6b{t  —  l)dT 

1  +  CTe{z) 

1 


R  —  Zlr 


-1 


-  m 

-Y _ 

m  ^  (1  +  ce{z))-^\k  -  z 


(2.43) 


where  e{z)  is  the  solution  to  a  fixed  point  equation  (2.14)  expressed  as 


e{z)  = 


— tr  <  R 

m 


m 


“  —  l)dT 

1  +  CTe{z) 

Afc 

^  (1  +  ce{z))-^\k  -  z 


R  —  zlr 


E 


(2.44) 


To  derive  a  more  compact  representation  of  S'^(2:),  we  introduce  a  new  variable 
t{z)  =  1  +  ce{z)  which,  using  (2.44),  is  given  by 


t{z)  =  1  +  — 

m  <  ^ 


t[z)\i 


zt{z) ' 


(2.45) 


Equivalently, 


t-\z)  =  i--Y 

m 


m  , 

c  Ai 


m  ^  \k  —  zt(z) 

fc=i  ^  ^ 

The  Stieltjes  transform  >5^(2:)  is,  using  (2.43),  expressed  in  terms  of  t{z)  as 


(2.46) 


^r(^)  = 


t{z) 


m 


^  Xk-zt{z)' 


(2.47) 
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Rearranging  the  summand  in  (2.46)  and  using  the  above  expression  yields 


t  ^{z)  =  1  —  c  —  czS-^{z). 


(2.48) 


Finally,  the  limiting  Stieltjes  transform  corresponding  to  the  SCM  R  is  obtained 
after  substituting  (2.48)  into  (2.47)  and  is  given  by 


1 

m 


m 


E 


1 

(l  -  c  -  C2:Er(2:))  \k-  z 


(2.49) 


Note  the  similarity  in  the  forms  of  (2.49)  and  (2.18).  In  addition,  (2.49)  could  be 
easily  obtained  from  the  characterization  of  the  limiting  Stieltjes  transform  given 
in  [21]  and  after  a  bit  of  algebraic  manipulations  derived  from  the  characterization 
given  in  [65]. 

The  limiting  Stieltjes  transform  corresponding  to  a  diagonally  loaded  SCM  is 
using  (2.42)  and  (2.49)  given  as  the  solution  to 


St,  (^)  =  -  F  7 - ^ - 4-  ,  - .  (2.50) 

“fc;  (l-c-c(2-5)Sft,(2))Ai-2  +  5 

Note  that  the  existence  and  uniqueness  of  the  solution  to  the  above  hxed  point  equa¬ 
tion  follows  from  the  existence  and  uniqueness  of  the  solution  to  (2.14).  In  addition, 
the  hxed-point  iterative  procedure  converges  to  the  solution  for  appropriate  initial¬ 
ization. 


The  Limiting  Moments  of  the  Diagonally  Loaded  SCM 

The  limiting  moments  Mi  and  M2  corresponding  to  a  diagonally  loaded  SCM  R^ 
are  evaluated  in  this  part.  These  results  are  exploited  in  the  theoretical  analysis  in 
Section  3.5. 

As  elaborated  in  Section  2.4,  the  limiting  moment  Mi  is  obtained  by  taking  the 
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limit  z  ^  0  in  (2.50)  and  is  given  by  the  solution  to  a  fixed  point  equation 


Ml  = 


1 

m 


m 


E 


1 

(l  —  c  +  c^Mi)  \k  +  5 


(2.51) 


Note  that  (2.38)  and  (2.51)  for  h  =  0  are  identical  up  to  a  scaling  factor  n  because 
the  SCM  corresponding  to  (2.38)  for  A  =  1  is  not  normalized  with  n. 

The  limiting  moment  M2  is  evaluated,  using  (2.26),  by  taking  the  limit  z  ^  0~ 
of  the  first  derivative  of  S-^^{z)  with  respect  to  ;2.  After  few  algebraic  steps  omitted 
here,  the  moment  M2  is  given  as  the  solution  to 


M2 


1 

m 


m 


E 


c  (Mi  —  26Ad2)  Afc  +  1 
((1  —  c  +  C(5Mi)Afc  +  5) 


(2.52) 


Given  that  the  solution  to  the  fixed  point  equation  (2.50)  exists  and  is  unique,  the 
solutions  to  (2.51)  and  (2.52)  exist  and  are  unique.  These  fixed  point  equations  can  be 
solved  via  the  classical  iterative  method  which  converges  if  appropriately  initialized. 
Given  that  the  considered  SGM  is  diagonally  loaded  with  S,  we  initialize  the  iterative 
methods  with  respectively  1/5  and  1/5^. 


The  Inverse  of  the  Rectangularly  Windowed  SCM 

This  part  characterizes  the  inverse  of  the  SGM  (2.39)  evaluated  from  n  rectangulary 
windowed  observation  vectors  of  dimension  m.  Assuming  the  observation  vectors 
have  ensemble  correlation  matrix  R,  the  SGM  is  modeled  as  in  (2.40).  The  limiting 
Eigenvector  Stieltjes  Transform  for  this  model  is  characterized  in  Theorem  2.2. 

Since  the  probability  of  receiving  two  observation  vectors  that  are  identical  up  to 
a  scaling  is  zero,  the  SGM  R  is  non-singular  when  n  >  mP‘  Therefore,  the  support  of 
the  empirical  Eigenvector  Stieltjes  Transform  (2.9)  can  be  extended  to  include  with 
A{2:}  =  0  and  <  0.  Taking  the  limit  z  ^  (he.,  =  0  and  — )■  0“) 

^Note  however  that  two  observation  vectors  can  be  close  to  each  other  at  very  high  SNR’s  leading 
to  an  ill-conditioned  SCM. 
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in  (2.17)  and  (2.18)  yields  that  as  m,n  ^  oo  such  that  m/n  ^  c  G  (0, 1), 


sf R  ^S2 


sfqfqfcSa 


k=l 


Afc(l 


a.s. 


(2.53) 


where  Xk  and  q^  are  the  eigenvalues  and  eigenvectors  of  the  ensemble  correlation 
matrix  R  and  Si,  and  S2  are  non- zero  deterministic  vectors  with  uniformly  bounded 
norms. 

Noting  that  the  limiting  quantity  in  (2.53)  is  the  quadratic  form  corresponding 
to  the  inverse  of  the  ensemble  correlation  matrix  R  scaled  by  1/(1  —  c),  it  is  dually 
concluded  that  as  m,  n  — )■  cxd  such  that  m/n  — )■  c  G  (0, 1), 


R 


-1 


-)■ 


R 


a.s. 


(2.54) 


2.5.3  White  Noise  Process 

This  part  summarizes  results  concerning  the  limiting  eigenvalue  density  functions 
corresponding  to  exponentially  weighted  and  rectangularly  windowed  SCM  of  white 
noise  process. 


Rectangularly  Windowed  SCM 

The  SCM  of  n  rectangularly  weighted  observation  vectors  of  zero  mean,  unit  variance 
white  noise  of  dimension  m  is  modeled  as^ 

R  =  -XX^,  (2.55) 

n 

where  X  is  m-by-n  matrix  with  i.i.d.  zero  mean,  unit  variance  entries. 

The  limiting  Stieltjes  transform  corresponding  to  (2.55)  is  obtained  from  The¬ 
orem  2.1  with  T  =  I  and  R  =  I.  Employing  the  Perron-Stieltjes  inversion  for¬ 
mula  (2.4),  the  corresponding  limiting  Eigenvalue  Density  Function  is,  in  the 

^The  cases  of  input  processes  of  non-unit  variance  are  easily  accommodated  by  including  proper 
scaling. 
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limit  when  m,  n{m)  — )■  cxd  at  the  same  rate  such  that  m/n  ^  c  E  (0,  cxd),  obtained  in 
a  closed  form  as 

=  max  (^1  -  -^,0^  Soix)  +  — —I[a,b]ix):  (2.56) 

where  I[a,b]{x)  =  1  for  a  <  x  <  b  and  is  zero  otherwise.  The  support  of  this  func¬ 
tion  is  compact  with  endpoints  a  =  (1  —  \/c)^  and  6  =  (1  -|-  \/c)^.  In  addition,  all 
the  eigenvalues  of  (2.55)  almost  surely  fall  within  the  support  of  n-^{x)  in  the  limit 
m,n{m)  -E  oo  [4].  Finally,  note  a  non- zero  mass  at  0  when  R  is  not  full-rank. 

This  result  is  widely  known  as  the  Marcenko-Pastur  law  [33].  Some  other  versions 
of  the  law,  which  do  not  require  identically  distributed  entries  in  X  are  reviewed 
in  [54].  Essentially,  the  condition  that  entries  in  X  are  identically  distributed  can  be 
abandoned  at  the  expense  of  imposing  the  requirement  that  the  entries  have  hnite 
moments  of  order  2  -|-  e  for  some  e  >  0  [13]. 

The  plots  of  Marcenko-Pastur  law  (2.56)  for  different  values  of  parameter  c  are 
shown  in  Fig.  2-1.  Note  that  the  ensemble  eigenvalue  of  the  considered  process  is  1 
(of  multiplicity  m).  Recall  that  c~^  represents  the  average  number  of  observations 
per  dimension.  Therefore,  it  is  intuitively  clear  that  as  c  decreases,  he.,  as  more 
observations  per  dimension  become  available,  the  eigenvalues  of  the  SCM  concentrate 
around  the  ensemble  eigenvalue.  In  other  words,  the  SCM  more  accurately  estimates 
the  ensemble  correlation  matrix. 

As  a  hnal  remark,  it  can  be  noted  from  Fig.  2-1  that  the  limiting  Eigenvalue 
Density  Function  is  not  symmetric  around  the  ensemble  eigenvalue.  More  specifically, 
the  plots  show  relatively  high  probability  of  observing  an  eigenvalue  smaller  than  1. 
This  implies  that  the  SCM  might  not  be  well  conditioned  when  n  is  not  enough  times 
larger  than  m.  We  will  return  to  this  observation  in  Section  4.5.3. 

The  Moments  of  the  SCM  of  White  Noise 

The  moments  corresponding  to  the  SCM  of  white  noise  process  can  be  evaluated  from 
the  limiting  Eigenvalue  Density  Function  (2.56)  using  (2.20). 
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Figure  2-1:  The  limiting  Eigenvalue  Density  Function  corresponding  to  SCM  of  zero 
mean,  unit  variance  white  noise  process  for  different  values  of  parameter  c.  The  Eigen¬ 
value  Density  Function  corresponding  to  the  ensemble  correlation  matrix  is  6d{x  —  1) 
(shown  in  green). 


For  the  reasons  that  will  become  clear  in  Chapter  4,  we  are  interested  in  computing 
the  limiting  moments  Mi  and  M2  of  a  model 

$=nR  +  M,  (2.57) 


where  R  is  dehned  in  (2.55). 

Since  the  eigenvalues  of  $  and  the  rectangularly  windowed  SCM  R  are  related 
through 

Xk{^)  =  nXk  (llj  +  6,  fc  =  l,2,...,m  (2.58) 

the  corresponding  limiting  eigenvalue  density  functions  are  related  as 

Mx)  =  ^/iK  •  (2-59) 

Using  (2.56),  (2.59),  and  (2.20),  the  limiting  moments  Mi  and  M2  are  evaluated 
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in  closed  form  and  given  by 


Ml  =  max 


1  a/5^  +  (m  —  n)^  +  26{m  +  n) 

5  25m 


n  —  m\  —  5 


(2.60) 


and 


M2  =  max 


1 


n  —  m 

25‘^m 


+ 


(m  —  riY  +  5{m  +  n) 
25'^my^5‘^  +  (m  —  n)^  +  25{m  +  n) 


(2.61) 


Exponentially  Weighted  SCM 


Here  we  consider  another  version  of  a  scaled  SCM.  It  is  computed  from  n  observa¬ 
tions  of  zero  mean,  unit  variance  white  noise  process  whose  observation  vectors  are 
exponentially  weighted  with  forgetting  factor  A  as 


#  =  (1  -  A)XAX^, 


(2.62) 


where  X  is  m-by-n  matrix  of  i.i.d.  zero  mean,  unit  variance  entries  and  A  = 
diag(A’^“^, . . . ,  A,  1).  Note  that  #  is  a  consistent  estimator  in  the  limit  as  n  — )■  cxd 
when  m  and  A  are  hnite  and  fixed. 

The  limiting  Stieltjes  transform  corresponding  to  (2.62)  can  be  characterized  using 
Theorem  2.1  with  T  =  A  and  accounting  for  scaling  factor  1  —  A.  It  is  shown  that 
as  n,  m,  1/(1  —  A)  — )■  cx)  at  a  non-zero,  hnite  rate  q  =  m{l  —  A)  G  (0,  cx))  the  limiting 
Stieltjes  transform  of  $  is  given  as  the  unique  solution  to  [9] 

zqSi^{z)  =  q  —  log(l  —  qSq,{z)).  (2.63) 


As  discussed  with  regard  to  Theorem  2.1,  (2.63)  can  be  solved  via  hxed-point 
iterations,  which,  if  appropriately  initialized,  converge  to  the  unique  solution.  The 
limiting  eigenvalue  density  function  of  the  scaled  and  exponentially  weighted 

SCM  (2.62)  is  then  obtained  from  the  inversion  formula  (2.4).  Note  that  the  parame¬ 
ter  q  represents  the  effective  average  number  of  observations  per  dimension.  In  that, 
it  is  analogous  to  the  parameter  c  in  the  Marcenko-Pastur  law. 
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The  limiting  eigenvalue  density  (obtained  from  (2.63)  and  (2.4))  has  com¬ 

pact  support  whose  upper  and  lower  edges  xi  and  X2  are  the  solutions  to  [9] 

x  =  \ogx  +  q  +  l.  (2.64) 

As  a  hnal  remark,  we  point  out  that  the  plots  of  the  limiting  eigenvalue  density 
functions  of  the  rectangularly  windowed  (Marcenko-Pastur  law)  and  exponentially 
weighted  SCM  look  alike  for  appropriately  chosen  parameters  c  and  q  [9].  This  ob¬ 
servation  is  exploited  in  the  analysis  in  Section  4.7.1. 


2.6  Gaussian  Method  for  Random  Matrices 

The  random  matrix  theory  concepts  discussed  so  far  focus  on  dehning  quantities 
that  encode  the  eigenvalues  and  eigenvectors  of  a  random  matrix  and  proving  their 
convergence  to  non-random  limits.  The  stated  results  provide  characterizations  of  the 
limiting  eigenvalue  and  eigenvector  Stieltjes  transforms  for  a  class  of  models  that  is 
particularly  useful  in  describing  an  SCM  of  an  arrival  process.  These  characterizations 
hold  under  relatively  mild  conditions  on  input  data  statistics  without  restricting  the 
input  data  to  originate  from  any  specihc  probability  distribution.  In  addition,  the 
results  prove  the  convergence  to  non-random  limits  without  specifying  the  rates  of 
convergence. 

As  opposed  to  that  approach,  the  Gaussian  method  requires  the  arrival  process 
be  Gaussian  distributed.  In  addition,  the  rates  of  convergence  of  different  quantities 
which  encode  the  eigenvalues  and  eigenvectors  naturally  follow  from  the  analysis.  To 
the  best  of  our  knowledge,  the  Gaussian  method  has  been  introduced  for  the  hrst 
time  to  study  the  behavior  of  mutual  information  in  a  Gaussian  MIMO  channel  [25]. 

The  Gaussian  method  is  based  on  two  important  results  which  address  the  ex¬ 
pectation  and  variance  of  functionals  of  Gaussian  random  variables.  These  results, 
called  Gaussian  tools,  are  stated  here  and  applied  in  Section  3.6. 
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We  assume  that  a  m-by-n  random  matrix  Y  is  modeled  as 


Y  =  D^X, 


(2.65) 


where 

1.  is  the  m-by-m  positive  semi-dehnite  square  root  of  the  non-negative  dehnite 
diagonal  matrix  D  with  uniformly  bounded  diagonal  elements. 

2.  X  is  an  m-by-n  matrix  with  i.i.d.  complex  Gaussian  entries  of  zero  mean  and 
unit  variance. 


Note  that  the  columns  of  the  matrix  Y  model  the  observation  vectors  of  Gaussian 
distributed  input  process  whose  eigenvalues  are  the  diagonal  elements  of  the  matrix 

D. 

An  integration  by  parts  formula  characterizes  the  expectation  of  a  functional  of  a 
Gaussian  matrix  Y  modeled  as  (2.65)  [25] 


E  [Yp/(Y)]  =  d,E 


df{Y) 

dY, 


(2.66) 


where  di  is  the  i-th  diagonal  element  of  the  matrix  D,  /(Y)  is  a  functional  of  a 
Gaussian  matrix  Y  whose  (qj)  entry  is  and  Y*^  is  a  complex-conjugate  of  Y^. 
This  result  is  used  in  order  to  evaluate  the  expectations  of  different  quadratic  forms. 

The  Poincare-Nash  inequality  upper  bounds  the  variance  of  a  functional  of  a 
Gaussian  matrix  Y  modeled  as  (2.65)  [25] 


var(/(Y))< 

1=1  j=i 


'  dfiY) 

2 

+ 

dfiY) 

2" 

dY,, 

dY,* 

(2.67) 


This  result  is  used  in  Ghapter  3  to  prove  that  the  variances  of  different  functionals 
are  upper  bounded  by  quantities  which  decay  to  zero  in  the  limit  when  m  and  n  grow 
large  at  the  same  rate.  Applying  the  Gauchy-Schwartz  inequality,  different  functionals 
are  then  proved  to  be  approximately  uncorrelated. 
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2.7  Interpretation  of  Random  Matrix  Theory  Re¬ 
sults 

The  random  matrix  theory  results  and  concepts  presented  in  the  previous  sections 
are  asymptotic  in  nature.  That  is,  the  number  of  dimensions  m  and  the  number 
of  observations  n  used  to  compute  the  SCM  grow  large  and  scale  linearly  such  that 
m/n  — )■  c  G  (0,  cxd).  Since  the  numbers  of  dimensions  and  observations  in  all  practical 
scenarios  are  hnite,  a  natural  question  of  how  to  exploit  the  random  matrix  theory 
results  arises. 

Before  answering  this  question,  the  following  numerical  experiment  is  conducted. 
Suppose  m  sensors  receive  i.i.d.  observation  vectors  of  zero  mean,  unit  variance  white 
noise  process.  A  number  of  realizations  of  SCM’s,  each  computed  from  n  different 
rectangularly  windowed  observation  vectors  are  generated.  Then,  a  set  of  eigenvalues 
is  formed  either  by  collecting  all  the  eigenvalues  computed  from  each  realization  of 
the  SCM,  or  by  sampling  uniformly  at  random  an  eigenvalue  from  each  realization 
of  an  SCM.  In  the  hnal  step,  a  histogram  of  the  obtained  collection  of  eigenvalues  is 
computed  and  normalized  such  that  its  area  is  1.  The  normalized  histogram  is  shown 
in  Fig.  2-2  for  n  =  20  and  m  =  10.  Note  that  these  values  of  m  and  n  imply  that 
the  SCM  is  evaluated  with  2  observations  per  dimension,  he.,  c  =  0.5.  The  limiting 
Eigenvalue  Density  Function  corresponding  to  this  arrival  model  is  characterized  with 
the  Marcenko-Pastur  law  (2.56).  The  plot  of  this  function  parameterized  with  c  =  0.5 
is  also  shown  in  Fig.  2-2.  Note  that  the  Marcenko-Pastur  law  (he.,  more  generally, 
the  limiting  Eigenvalue  Density  Eunction)  corresponds  to  a  single  realization  of  the 
SCM  in  the  large  m  and  n  limit,  as  opposed  to  a  histogram  which  is  evaluated  from 
a  number  of  SCM  realizations. 

The  agreement  between  the  normalized  histogram  and  Marcenko-Pastur  law  in 
Fig.  2-2  implies  that  even  though  the  limiting  Eigenvalue  Density  Eunction  is  the 
asymptotic  result  obtained  when  m,  n  — )■  cx)  such  that  m/n  — )■  c,  where  c  =  0.5,  it 
fairly  accurately  approximates  the  expectation  of  the  distribution  when  m  and  n  are 
hnite  and  relatively  small. 
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1 


□  Experiment 


Figure  2-2:  Comparison  between  Marcenko-Pastur  law  with  c  =  0.5  and  experimen¬ 
tally  obtained  Eigenvalue  Density  Function  for  n  =  20  and  m  =  10. 


The  previous  example  indicates  how  random  matrix  theory  results  are  more  gen¬ 
erally  used  in  practice  [37].  Namely,  different  limiting  quantities  (Eigenvalue  Density 
Function,  Stieltjes  transforms,  moments)  are  evaluated  in  the  limit  when  m,  u  — >■  cxo. 
However,  they  converge  rapidly  with  increasing  m  and  n  and  a  good  agreement  be¬ 
tween  the  analytical  expressions  and  the  experimentally  obtained  counterparts  is 
achieved  even  for  relatively  small  m  and  n.  Therefore  the  expectations  of  these 
quantities  for  hnite  m  and  n  are  approximated  with  the  corresponding  asymptotic 
characterizations  parameterized  with  hnite  c  =  m/n. 

Formally,  provided  that  the  corresponding  limiting  Eigenvalue  Density  Function 
exists,  the  expected  values  of  the  eigenvalue  and  eigenvector  Stieltjes  transforms  for 
a  m-by-m  random  and  Hermitian  matrix  are  approximated  by 

E  [Sa^  (z-,  m,  n)]  ^  c  =  ^)  (2.68) 

(TTt  \ 

z-c=—j  (2.69) 

Similarly,  the  expectation  of  the  fc-th  order  moment  for  hnite  m  and  n  is  approximated 
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with  the  limiting  k-ih.  order  moment 


—  /7Tl\ 

E  [Mfc(m,n)]  ^  Mk  y—j  • 


(2.70) 
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Chapter  3 


Spatial  Power  Spectum  Estimation 

3.1  Introduction 

Spatial  spectrum  estimation  consists  of  estimating  the  power  received  at  an  array 
of  sensors  as  a  function  of  direction  of  arrival.  The  power  of  the  signal  arriving 
from  a  particular  direction  can  be  estimated  using  the  Minimum  Power  Distortionless 
Response  (MPDR)  beamformer  [52],  The  MPDR  beamformer  array  weights  depend 
on  the  spatial  correlation  matrix  of  the  received  signal  as  received  at  the  array  of 
sensors.  Usually,  the  spatial  correlation  matrix  is  nnknown  and  must  be  estimated 
from  the  received  data.  Due  to  the  time-varying  natnre  of  the  environment  and  the 
fact  that  an  array  can  contain  large  nnmber  of  sensors,  the  nnmber  of  snapshots 
that  can  be  collected  over  the  approximate  stationarity  of  the  environment  might 
be  insnfficient  to  accnrately  estimate  the  correlation  matrix.^  Diagonal  loading,  also 
known  as  Tikhonov  regnlarization,  is  extensively  used  to  address  this  problem.  This 
approach  consists  of  adding  a  small  regnlarization  matrix,  usnally  a  scaled  identity 
matrix,  to  the  estimated  spatial  correlation  matrix,  with  the  goal  of  reducing  the  L2 
norm  of  the  resulting  array  weights  and  thus,  the  sensitivity  of  the  beamformer  to 
the  model  mismatch  caused  by  the  dehcient  sample  support.  A  problem  that  arises  is 
how  to  choose  an  optimal  regularization  such  that  a  certain  performance  metric  such 

^Note  also  that  the  spatial  coherence  of  the  environment  might  be  lost  if  the  array  has  long 
aperture.  This  problem  is  not  considered  here. 
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as  mean  square  error  (MSE)  is  optimized.  This  chapter  considers  such  a  problem. 

The  statistical  characterization  of  the  diagonally  loaded  MPDR  beamformer  out¬ 
put  has  received  considerable  attention  in  the  literature.  The  mean  value  and  variance 
of  the  MPDR  beamformer  output  is  studied  in  [11]  by  assuming  a  complex  Gaussian 
received  signal,  zero  diagonal  loading  in  the  computation  of  array  weights  and  that 
the  number  of  snapshots  is  larger  than  the  number  of  sensors.  The  probability  density 
function  of  the  MPDR  output  when  the  received  signal  is  complex  Gaussian  and  at 
most  two  signals  are  present  in  the  sensed  held  is  studied  in  [37].  The  work  in  [35] 
characterizes  the  expected  value  of  the  signal-to-interference-plus-noise  ratio  (SINR) 
at  the  MPDR  beamformer  output.  A  study  of  the  probability  density  function  of  the 
SNR  of  the  diagonally  loaded  MPDR  beamformer  output  using  the  Gaussian  method, 
introduced  in  Section  2.6,  is  reported  in  [44]  and  [45]. 

This  chapter  studies  a  diagonal  loading  problem  for  two  commonly  used  spatial 
power  spectrum  estimators  based  on  the  MPDR  beamformer  [3].  The  MSE  between 
the  estimated  and  true  or  spatial  power  spectra  obtained  with  the  ensemble  correlation 
matrix,  is  adopted  as  a  performance  metric  and  one  of  the  main  goals  is  to  explore  how 
the  optimal  diagonal  loading  changes  with  steering  direction  in  a  snapshot  dehcient 
regime.  In  doing  so,  we  study  how  the  power  estimators  behave  with  respect  to 
diagonal  loading,  number  of  snapshots  and  number  of  sensors.  This  is  done  in  several 
steps. 

First,  we  analyze  behavior  of  the  power  estimators,  their  expectations,  variances 
and  MSE’s  in  the  limit  when  the  number  of  snapshots  and  sensors  grow  large  at 
the  same  rate.  It  is  shown  that  both  power  estimators  for  a  hxed  diagonal  loading 
and  steering  direction  almost  surely  converge  to  non-random  quantities  in  the  limit 
when  the  numbers  of  snapshots  and  sensors  grow  large  at  the  same  rate.  When 
the  input  process  is  Gaussian,  the  variances  and  MSE’s  of  the  power  estimators  are 
characterized  in  the  limit,  along  with  the  rates  of  convergence. 

Second,  we  study  the  interplay  between  the  bias  and  variance  in  determining  the 
optimal  diagonal  loading  which  minimizes  the  estimation  MSE.  We  conjecture  that 
the  variance  of  the  considered  power  estimators  does  not  signihcantly  impact  the 
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value  of  the  optimal  diagonal  loading.  Thus,  the  increase  in  MSE  due  to  diagonal 
loading  is  primarily  due  to  an  increase  in  the  squared  bias  and  diagonal  loading  that 
is  chosen  to  minimize  the  squared  bias  has  an  MSE  that  is  close  to  that  of  the  weights 
chosen  to  minimize  the  MSE. 

Third,  we  investigate  how  optimal  diagonal  loading  behaves  with  respect  to  steer¬ 
ing  direction  when  the  arrival  process  consists  of  one  or  more  plane  waves  contami¬ 
nated  with  uncorrelated  noise.  It  is  shown  that  the  optimal  diagonal  loading  increases 
as  the  steering  direction  moves  away  from  a  point  source  and  follows  an  oscillatory 
behavior. 

Finally,  the  MSE  performances  of  the  two  power  estimators  are  compared  and 
is  shown  that  one  of  them  performs  better  (lower  MSE)  at  the  expense  of  increased 
sensitivity  to  diagonal  loading. 

The  rest  of  this  chapter  is  organized  as  follows.  A  background  on  MPDR  based 
power  estimation,  along  with  the  problem  formulation,  is  presented  in  Section  3.2.  A 
suitable  representation  of  the  power  estimators,  two  approaches  in  computing  the  true 
power,  main  assumptions  and  simulation  scenarios  used  throughout  the  chapter  are 
introduced  in  Section  3.3.  The  preliminary  insights  into  behavior  of  power  estimators 
when  viewed  as  functions  of  diagonal  loading  are  presented  in  Section  3.4.  The 
asymptotic  behavior  of  power  estimators  is  derived  in  Section  3.5.  The  derivation  of 
the  asymptotic  behavior  of  the  variance  and  MSE  corresponding  to  power  estimators 
is  presented  in  Section  3.6.  Section  3.7  analyses  how  the  squared  bias  and  variance 
impact  the  value  of  optimal  diagonal  loading.  Section  3.8  investigates  the  value 
of  the  optimal  diagonal  loading  in  scenarios  in  which  a  single  or  multiple  sources 
are  embedded  in  uncorrelated  noise.  The  relative  performances  of  the  two  power 
estimators  are  compared  in  Section  3.9.  Finally,  Section  3.10  concludes  this  chapter. 

Throughout  this  section,  denotes  an  inner  product  between  a  vector  s  and 

the  ith  column  of  a  matrix  A.  Similarly,  (As)^.  is  an  inner  product  between  the  jth 
row  of  A  and  s.  Also,  ||A||  denotes  a  Frobenius  norm  and  x  =  0{a)  means  that 
X  <  Ka,  for  some  constant  scalar  K. 
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3.2  Background 


An  array  of  m  sensors  receives  at  discrete  time  k  a  signal  whose  Fourier  coefficients 
in  a  frequency  bin  of  interest  and  across  the  array  constitute  an  observation  vector 
\i{k)  G  C"^,  known  as  a  snapshot.  A  snapshot  contains  signatures  of  the  useful  signal, 
interferers  and  noise.  The  ensemble  correlation  between  snapshots  received  at  times 
i  and  j  is 

E  [u(i)u^(j)]  =  -  j),  (3.1) 

where  Snlx)  is  the  Dirac  delta  function.  The  literature  usually  refers  to  the  correlation 
matrix  R  as  a  temporal  frequency  spatial  correlation  matrix  [52] . 

The  MPDR  beamformer  passes  a  signal  arriving  from  a  look  direction  character¬ 
ized  by  the  replica  vector  v*  undistorted  and  minimizes  the  output  power,  i.e.,  [52] 


wmpdr(vs)  =  argmin  w^Rw  s.t.  w^Vs  =  1. 

W 


(3.2) 


In  a  simple  model  of  a  plane  wave  arriving  from  elevation  angle  0  on  a  linear,  uniform 
and  vertical  array  with  inter-element  spacing  d,  the  signal  replica  vector  is  given 


by  [52] 


1  e-:)^dcos{e) 


Q-j^im-l)dcos{9) 


(3.3) 


where  u  is  the  signal  wavelength.  Here,  9  =  0  indicates  a  signal  propagating  vertically 
in  the  downward  direction  and  6  =  7r/2  indicates  a  signal  propagating  horizontally 
{i.e.,  broadside  to  the  array).  While  the  model  in  (3.3)  is  for  a  plane  wave  received 
at  a  linear  uniform  array,  the  results  developed  here  are  equally  applicable  to  a  gen¬ 
eral  signal  arising  in  the  contexts  of  non-linear  and  non-uniform  arrays  as  well  as  in 
matched  held  processing  problems. 

The  solution  to  (3.2)  is  given  in  a  closed  form  by  [52] 


wmpdr(vs) 


R 

vfR-W/ 


(3.4) 


The  number  of  sources  and  their  respective  received  powers  can  be  estimated  by 
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steering  the  MPDR  beamformer  across  all  possible  look  directions.  Hence,  the  power 
of  the  signal  arriving  from  the  direction  corresponding  to  the  replica  vector  v*  is 

.Pmpdr(vs)  =  w^pj3j^(v5)Rwmpdr(vs)  =  „  , — 

The  correlation  matrix  R  is  usually  unknown  and  is  estimated  from  the  observed 
data.  We  consider  in  this  chapter  a  sample  correlation  matrix  (SCM)  R  evaluated 
from  n  rectangularly  windowed  observations  dehned  as 

1  ” 

R  =  —  ^  u(i)u^(i).  (3.5) 

^  ^ 

1=1 

As  elaborated  in  Section  1.2,  the  number  of  snapshots  that  can  be  collected  in  the 
interval  over  which  the  environment  can  be  considered  stationary,  n,  might  be  insuf- 
hcient  to  accurately  estimate  R,  which  leads  to  one  type  of  model  mismatch  [52] .  To 
combat  the  sensitivity  to  mismatch  and/or  to  improve  the  condition  number  of  R,  a 
diagonally  loaded  SCM  R^  is  introduced 

R^  =  R  +  51,  (3-6) 

where  5  is  a  diagonal  loading  parameter. 

The  MPDR  array  weight  vector  evaluated  with  a  diagonally  loaded  SCM  is  the 
solution  to  the  following  problem 

wmpdr(vs)  =  argmin  w^Rw  +  6w^w  s.t.  =  1.  (3.7) 

W 

Thus,  array  weight  vectors  with  large  L2  norms  are  penalized  and  the  L2  norm  of 
the  resulting  array  weight  vector  decreases  monotonically  with  increasing  S.  Since  the 
effect  of  insufficient  sample  support  is  a  type  of  modeling  mismatch  in  the  SCM  [52] 
and  the  sensitivity  of  the  performance  of  an  array  weight  vector  on  model  mismatch 
is  proportional  to  its  L2  norm  squared,  computing  the  array  weights  with  a  diago¬ 
nally  loaded  SCM  will  reduce  the  negative  impact  of  insufficient  sample  support  on 
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processor  performance.  However,  reducing  the  L2  norm  of  the  array  weights  will  also 
reduce  its  ability  to  resolve  closely  spaced  sources  or  attenuate  nearby  interfering 
signals  while  passing  the  signal  from  the  steering  direction  undistorted.  The  choice 
of  the  optimal  diagonal  loading  balances  these  two  factors  and,  as  we  show,  depends 
on  both  the  sample  support  of  the  SCM  and  how  far  the  steering  direction  is  from 
nearby  interfering  signals. 


The  power  of  the  signal  arriving  from  direction  is  estimated  from  (3.5)  where 
the  MPDR  weights  are  evaluated  using  (3.4)  with  a  diagonally  loaded  SCM  and 
the  SCM  R  is  used  instead  of  R  in  (3.5).  That  is  [3], 


V?R7S 


(3.8) 


If  diagonally  loaded  SCM  R^  is  substituted  in  (3.5)  instead  of  R,  an  alternative  and 
more  compact  form  is  obtained  [3] 


Ph{S,  vj 


(3.9) 


Broadly  speaking,  this  chapter  studies  how  diagonal  loading  impacts  the  perfor¬ 
mance  of  these  two  adaptive  beamformers  in  the  snapshot  dehcient  regime. 

The  performance  of  the  power  estimators  (3.8)  and  (3.9)  is  measured  via  esti¬ 
mation  mean  square  error  (MSE).  Given  a  steering  direction  v^,  the  MSE  of  the 
power  estimator  P  is  viewed  as  a  function  of  the  loading  S  and  represented  via  the 
bias-variance  decomposition  as 


MSE((5)  =  E2  \P{6)-P  +  var  P(5) 


(3.10) 


where  the  first  term  is  a  squared  bias,  which  we  denote  by  bias^((5),  while  P  is  the 
true  power  of  the  signal  arriving  from  the  considered  direction  v^.  In  the  absence  of 
a  subscript,  P  refers  to  either  of  the  two  estimators. 
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The  optimal  diagonal  loading  hopt  is  defined  as 


(^opt  =  argmin^  MSE((5).  (3-11) 

In  addition,  a  diagonal  loading  which  minimizes  the  squared  bias  is  denoted  by  ^opt 
and  formally  defined  as 

^opt  =  argmin^  bias^((5).  (3.12) 

This  chapter  develops  an  understanding  of  how  the  MSE  performance  in  a  snap¬ 
shot  deficient  regime  depends  on  diagonal  loading.  In  achieving  this,  we  study  how 
a  diagonal  loading  6,  number  of  sensors  m  and  number  of  snapshots  n  impact  the 
squared  bias  and  variance  of  the  power  estimators. 


3.3  Preview  and  Assumptions 

This  section  introduces  an  alternative  expression  for  power  estimators,  presents  a 
particular  model  for  the  snapshots  and  summarizes  the  assumptions  used  in  the  the¬ 
oretical  analysis.  Also,  the  true  power  of  a  signal  impinging  upon  an  array  is  defined 
using  two  different  approaches.  Finally,  two  scenarios  used  in  the  simulations  that 
validate  the  insights  and  derived  characterizations  in  this  chapter  are  presented. 

3.3.1  Alternative  Expressions  for  Power  Estimators 

The  power  estimators  Pa  and  Pb  are  represented  via  quadratic  forms  Qk,  defined  as 

Qk{5)  =  vf  (3.13) 

and  viewed  as  functions  of  diagonal  loading  S  for  a  fixed  steering  direction  Vg.  There¬ 
fore,  the  power  estimator  Pb  is  expressed  as 
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On  the  other  hand,  solving  (3.6)  for  R  and  substituting  it  into  (3.8)  yields 


A  (5, 


vf  R7I  (r,  -  5l)  R.-'v, 
vf  ' 

1  M5) 

Qm  Qm' 


(3.15) 


The  analysis  throughput  this  chapter  uses  these  two  alternative  expressions  for 
the  power  estimators. 


3.3.2  Gaussian  Snapshot  Model 

Assuming  the  ensemble  correlation  matrix  of  the  snapshots  is  R,  the  rectangularly 
windowed  SCM  (3.5)  is  modeled  as  described  in  Section  2.5.2  as 

R=-R^XXRT  (3.16) 

n 

where  X  is  a  complex  m-by-n  matrix  with  i.i.d.  entries  of  zero  mean  and  unit  variance. 
R^  is  the  Hermitian  positive  dehnite  square  root  of  the  correlation  matrix  R. 

In  the  asymptotic  study  of  the  variances  associated  with  power  estimators,  we 
assume  the  input  snapshots  are  Gaussian  distributed.^  This  assumption  allows  us  to 
represent  the  SCM  in  a  more  convenient  form. 

The  eigenvalues  of  the  correlation  matrix  R  are  denoted  by  Ai,  A2, . . . ,  A^,  and 
collected  into  a  diagonal  eigenvalue  matrix  D,  such  that  the  eigen-decomposition  of 
R  is  given  by  R  =  QDQ^.  Using  (3.16),  a  diagonally  loaded  SCM  (3.6)  is  expressed 
as 

R^  =  -QD^Q^XX^QD^Q^  +  51.  (3.17) 

n 

Since  the  received  snapshots  are  Gaussian,  matrix  X  is  unitary  invariant.  Then, 
since  Q  is  a  unitary  matrix,  p(Q'^X)  =  p(Q)  where  p(B)  denotes  a  joint  probability 
distribution  of  the  elements  of  a  matrix  B.  Therefore,  statistically,  a  matrix  Q^X 

^Note  that  the  corresponding  SCM  R  has  Wishart  distribution. 
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in  (3.17)  can  be  replaced  by  X.  Hence,  introducing 


Y  =  D^X, 


(3.18) 


a  diagonally  loaded  SCM  is  in  a  statistical  sense  equivalently  represented  as 


=  -QD^XX^D^Q^  +  (5I 
n 

=  -QYY^Q^  +  61 

n 

=  Q  +  51^  Q^.  (3.19) 


Note  that  the  right  hand  sides  of  (3.17)  and  (3.19)  are  not  equal,  but  have  the  same 
joint  probability  distribution  of  the  elements.  Consequently,  the  moments  of  their 
functionals  are  the  same. 


Using  (3.19)  and  dehnition  (3.13),  a  quadratic  form  is  expressed  as 

Qk  =  vf Q  (^^YY^  +  51^  Q^v,  (3.20) 

=  t^s-^H^(t)s,  (3.21) 

where  s  =  Q^v*,  t  =  |  and 

H(t)  =  (3.22) 

is  a  resolvent  matrix. 


Note  that  the  quadratic  forms  Qk  are  functionals  of  Gaussian  matrix  Y  (3.18) 
which  satishes  the  conditions  of  integration  by  parts  formula  (2.66)  and  Poincare- 
Nash  inequality  (2.67).  Thus,  under  the  assumption  that  the  received  snapshots  are 
Gaussian  distributed,  we  evaluate  the  variances  of  the  power  estimators  using  the 
Gaussian  method. 
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3.3.3  Assumptions 


The  main  assumptions  used  throughout  this  chapter  are  listed  here.  Namely, 

1.  The  number  of  snapshots  n  and  number  of  sensors  m  are  of  the  same  order.  If 
not  specihed  otherwise,  this  means  that 

7Tl  7Tl 

0  <  lim  inf  —  <  lim  sup  —  <  cxd.  (3.23) 

n  n 

Note  that  m/n  does  not  necessarily  need  to  converge.  All  what  is  required  is 
the  ratio  be  non-zero  and  hnite.  Note  that  this  condition  also  implies  that  m 
and  n  scale  linearly  in  the  limit. 

2.  The  eigenvalues  of  the  ensemble  correlation  matrix  R  are  uniformly  upper 
bounded  for  all  m,  he.. 


max{Xi,X2,. ..  ,Xm}  <  Dm  <  oo  (3.24) 

3.  The  eigenvalues  of  the  SCM  R,  denoted  by  Ai,...,Am  are  uniformly  upper 
bounded  for  all  m,  i.e., 

max{Ai, . . . ,  Xm}  <  Dm  <  oo  (3.25) 

Note  that  these  eigenvalues  are  lower  bounded  by  some  dm  >  0. 

4.  The  norm  of  the  signal  replica  vector,  v^,  is  uniformly  upper  bounded  for  all  m 
such  that 

||vs||  =  ||s||  <  Sm  =  0{y/m).  (3.26) 

Note  that  for  any  array  in  a  plane  wave  environment  where  the  magnitude  of 
each  element  of  the  replica  vector  equals  1,  Hv^H  =  ydu. 
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3.3.4  Definitions  of  True  Power 


The  estimation  MSE  performance  of  power  estimators  characterizes  how  ’’close”  an 
estimate  is  to  the  true  value  of  the  power.  While  there  is  no  ambiguity  what  true 
power  is,  we  introduce  here  an  alternative  dehnition  which  is  in  some  sense  better 
suited  for  the  study  of  estimation  performance  in  snapshot  dehcient  regime. 


Traditional  Definition  of  True  Power 


For  completeness,  the  traditional  way  of  dehning  the  true  power  corresponding  to  a 
specihc  arrival  model  is  presented  in  this  part. 

Namely,  if  the  received  signal  is  composed  of  a  number  of  signals,  each  having 
power  Pi  and  impinging  on  the  array  in  the  direction  described  by  a  replica  vector 
Vj,  the  true  power  in  the  steering  direction  v*  is  dehned  as  [26] 


m  ’ 


if  V,  ^  Vj 


P^  + 


if  =  Vj, 


(3.27) 


where  ct^(vs)  is  the  level  of  the  noise  power  spectral  density  in  the  direction  described 
by  Vg.  We  assume  the  noise  is  uncorrelated  and  the  inter-element  spacing  is  half-a- 
wavelength  such  that  o-^(v5)  = 


Alternative  Definition  of  True  Power 


Alternatively,  true  power  can  be  dehned  as  the  power  that  would  be  estimated  if  the 
ensemble  correlation  matrix  R  was  known. ^  Since  no  loading  is  needed  in  this  case. 


this  power  is  given  by 


P  = 


vfR-W/ 


(3.28) 


The  justihcation  for  this  dehnition  is  found  in  the  fact  that  even  when  the  correlation 
matrix  R  is  known,  the  estimators  Pa  and  Pb  are  unable  to  accurately  estimate  the 

^Traditional,  in  the  sense  that  it  is  usually  used  for  the  power  density  spectrum  estimation  of 
time  series  data. 

^In  fact,  this  is  a  more  useful  definition  in  the  context  of  spectrum  estimation  using  an  array  of 
sensors. 
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power  when  steering  very  close  to  the  source  direction.  When  the  SCM  R  is  estimated 
from  a  limited  number  of  snapshots,  the  best  that  one  can  do  is  to  approach  the 
performance  achieved  with  a  known  correlation  matrix  R. 

3.3.5  Simulation  Scenarios 

The  theoretical  results  developed  in  this  chapter  are  validated  via  Monte-Carlo  simu¬ 
lations.  A  standard,  vertical,  linear  array  with  u/2  (half-a- wavelength)  separation  of 
the  elements  is  considered  in  the  simulations.  In  addition,  a  spatially  uncorrelated, 
zero-mean  noise  with  a  variance  of  one  corrupts  the  signal  snapshots.  The  following 
two  different  arrival  structures  are  considered. 

-  In  Scenario  1,  2  signals  are  arriving  at  elevation  angles  of  90"  and  92°  with 
respect  to  the  broadside  of  the  array  and  each  has  power  10  (he.,  the  SNR  is 
10  dB).  The  array  contains  30  sensors  and  50  snapshots  are  used  to  estimate 
the  SCM.  Note  that  the  ratio  between  the  array  aperture  and  wavelength  is  15. 
Also,  note  that  c  =  0.6  and  the  normalized  trace  of  the  ensemble  correlation 
matrix  of  the  input  process  is  21. 

-  In  Scenario  2,  2  signals  are  arriving  at  elevation  angles  of  90°  and  94°  with 
respect  to  the  broadside  of  the  array.  Their  SNR’s  are  respectively  1  dB  and  5 
dB.  The  array  has  40  sensors  and  the  number  of  snapshots  available  to  estimate 
the  SCM  is  25.  Note  that  the  ratio  between  the  array  aperture  and  wavelength  is 
20.  Also  note  that  c  =  1.6  and  the  normalized  trace  of  the  ensemble  correlation 
matrix  is  5.42. 


3.4  Preliminary  Results  on  Behavior  of  Power  Es¬ 
timators 

The  power  estimators  Pa  and  /\,,  viewed  as  functions  of  diagonal  loading  5,  are  studied 
in  this  section.  In  particular,  some  results  concerning  the  behavior  of  squared  bias  and 
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variance  with  respect  to  diagonal  loading  5  are  presented.  Throughout  the  section, 
we  do  not  explicitly  show  the  dependence  on  steering  direction  in  the  notation  for 
power  estimators,  he.,  with  slight  abuse  of  notation  we  write  Pa{S)  and  Pb{S). 


3.4.1  Dependence  of  Squared  Bias  on  Diagonal  Loading 

A  study  of  how  the  squared  bias,  bias^((5),  depends  on  5  is  centered  around  exploring 
how  the  estimators  and  their  expectations  depend  on  S.  The  results  concerning  the 
behavior  of  the  estimators  Pa  and  Pb  are  summarized  in  the  following  lemma. 

Lemma  3.1.  For  any  (Hermitian  non-negative  definite)  5'C'MR  and  for  6  >  0,  under 
assumption  3.,  the  following  holds 

Pb{d)  >  Pa{d),  with  eguality  if  and  only  if  S  =  0. 

2.  Pap)  is  monotonieally  inereasing  for  0  <  5  <  oo,  unless  all  the  eigenvalues  of 

R  are  egual.  Its  slope  is  zero  at  S  =  (i.e.,  when  S  approaehes  0  from  the 

positive  side)  and  when  5  ^  oo. 

3.  Pbp)  is  strietly  monotonieally  inereasing  for  all  0  <  S  <  oo. 


Proof.  1.  From  (3.14)  and  (3.15),  two  power  estimators  are  related  as 

Pa(S)  =  h(S)  -  s3i 

Vl 

Since  the  sample  eigenvalues  are  uniformly  upper  bounded,  Q2  >  0.  Therefore, 
Pb  >  Pa.  The  equality  holds  when  <5  =  0. 


2. 


Denoting  the  eigenvalues  and  eigenvectors  of  R  with  A*  and  q*,  the  power  esti¬ 
mator  Pa  is  using  (3.8)  expressed  as 
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Since  ^  >  0,  the  estimator  A  is  a  strictly  increasing  function.  Its  slope  is 
positive  at  (5  =  0.  When  <5  — )■  cxd,  its  slope  converges  to  1/m. 

□ 

In  words,  both  estimators  Pa  and  Pb  are  monotonically  non-decreasing  functions 
of  S.  While  Pa  converges  to  a  hnite  value  as  5  — )■  oo,  the  estimator  A  is  unbounded. 
For  the  same  non-zero  loading  S,  the  estimator  Pb  is  greater  than  Pa. 

These  results  carry  over  to  E  Pa{S)  and  E  Pb{S)  whenever  the  derivative  and 
expectation  can  interchange  the  order.  The  following  lemma  summarizes  how  the 
expected  values  of  the  power  estimators  behave  when  5  =  0  and  5  — )■  cxd.  The  result 
corresponding  to  the  case  of  5  =  0  follows  from  later  development,  but  is  presented 
here  for  completeness. 
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Lemma  3.2.  Assuming  the  expectation  is  taken  over  all  possible  realizations  of  the 
SCM  R  and  6  >  0, 

1.  For  any  fixed  m  andn,  lim^^oo  E  Pa{d)  =  ^v^Rv^. 

2.  In  the  limit  when  m,n  ^  oo  at  the  same  rate  such  that  ^  ^  c  G  (0, 1),  for  both 

estimators,  P{0)  — >  ?  almost  surely. 

Proof.  1.  When  S  — )■  oo,  the  MPDR  beamformer  becomes  a  matched  hlter  (he., 
the  conventional  beamformer)  with  array  weight  vector  w  There- 

Vg  Vg 

fore, 

lim  Pa{6)  =  ^vf Rv^.  (3.30) 

s^oo  m 

The  proof  is  completed  after  taking  the  expectation  of  both  sides  of  (3.30). 


The  functional  dependence  of  E  Pa  and  E  R,  on  diagonal  loading  6  is  visualized 
in  Fig.  3-1.  Although  the  plot  is  generated  for  a  specihc  scenario,  this  behavior  is 
typical. 

So  far  we  have  considered  how  power  estimators  and  their  expectations  depend  on 
diagonal  loading.  We  now  examine  how  the  squared  bias  associated  with  the  power 
estimators  depends  on  diagonal  loading.  Namely,  the  behavior  of  the  squared  bias  is 
directly  implied  from  the  preceding  results  and  summarized  as  follows.  As  such,  for 
a  non-negative  diagonal  loading  S, 


1.  The  squared  bias  corresponding  to  the  power  estimator  Pa  attains  a  unique 
global  minimum,  equal  to  zero,  at  a  hnite,  non- zero  ^opt  if  the  true  power  P  satis- 
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Figure  3-1:  Typical  behavior  of  the  expectations  of  the  estimators  Pa  and  The 
scenario  used  to  generate  the  plots  is  described  in  the  caption  of  Fig.  3-28.  The 
steering  angle  is  89°.  The  normalized  trace  of  the  ensemble  correlation  matrix  of  the 
considered  model  is  21. 


hes  E 


Jt(0) 


<  P  <  lim 


S—^oo 


E 


Pa((5)  .  If  the  true  power  P  >  lim^^oo  E 


then  the  squared  bias  corresponding  to  Pa  decays  as  5  — )■  cxo. 

2.  The  squared  bias  corresponding  to  the  power  estimator  attains  a  unique 
global  minimum,  equal  to  zero,  at  a  hnite,  non-zero  ^opt  if  the  true  power  P 


satishes  E 


n(o) 


<  P  <  oo. 


3.  If  the  true  power  P  <  E  P(0)  ,  then  the  squared  bias  of  either  of  the  estimators 


is  minimized  at  (fopt  =  0. 


4.  The  squared  bias  is  monotonically  decreasing  for  0  <  5  <  (fopt  and  monotonically 
increasing  for  S  >  ^opt- 


3.4.2  Dependence  of  Variance  on  Diagonal  Loading 

This  part  presents  results  which  show  how  the  variances  of  the  power  estimators  Pa 
and  Pb  behave  with  respect  to  loading  6. 

First,  under  assumption  2,  the  norm  of  the  SCM  R  is  uniformly  bounded  and 
consequently  Pa,  Pf ,  Pb  and  P^  are  all  uniformly  bounded.  This  implies  that  the  hrst 
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and  second  moments  of  the  estimators  are  also  bounded.  Therefore,  the  variances  of 
the  power  estimators  are  hnite  for  a  hnite  loading  6. 

Note  that  as  <5  — )■  cxd,  converges  to  the  power  estimator  associated  with  the 
matched  hlter,  whose  variance  is  hnite.  Consequently,  the  variance  of  Pa{S)  is  hnite 
as  (5  — )■  cxD. 

In  general,  var  ^Pa((^)j  7^  var  ^Pfe(<5)j-  However,  it  turns  out  that  these  variances 
are  equal  when  <5  — )■  cxd.  From  that  result,  we  conclude  that  the  variance  of  Pb  is 
hnite  in  the  limit  6  ^  oo.  In  addition,  if  the  input  process  is  Gaussian,  we  are  able 
to  analytically  characterize  this  variance.  The  results  are  summarized  and  proved  in 
the  following  lemma. 

Lemma  3.3.  Under  assumption  2,  the  following  holds, 

1.  Iini5^oo  var  =  hm^^oo  var  ■ 

2.  If  the  received  snapshots  z(i)  are  Gaussian  distributed,  then 

lim  var(Pa(5))  =  (3.32) 

5^00  V  ^  V  nm^  ^  ’ 

Proof.  1.  The  diherence  between  the  two  variances  is  after  simple  algebraic  ma¬ 
nipulations  expressed  as 

var  -  var  (^A)  =  E  [AB]  -  E  [H]  E  [B] ,  (3.33) 

where  A  =  ^  and  B  =  The  goal  is  to  hnd  the  limit  of  (3.33)  when 

(5  — )■  cxD.  This  is  solved  by  considering  the  leading  order  terms  in  A  and  B.  To 
do  so,  the  quadratic  form  Qk,  /c  G  {1,  2}  is  expressed  as 

Qk 


where  t  =  1/5.  In  the  t  — )■  0  regime  and  since  the  eigenvalues  A*  are  upper 


—k 

tR  4- 1 )  V., 


IvfqA 


fe(tA*  +  l)^’ 


(3.34) 
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bounded,  t\i  1.  Therefore,  using  the  Taylor  series  expansion  of  (1  +  t\i)  ^ 
yields 

CXD 

Qi  = 

j=0 

OO 

Q2  =  +  l)t^aj, 

j=o 

where  aj  =  v^R-^Vs.  Substituting  Qi  and  Q2  into  expressions  for  A  and  B 
yields 


(3.35) 

(3.36) 


A 

B 


1  2ai 

mt  w? 


3o2^ 

w? 


+  o{e) 


■y+—^+o{t^] 

mt 


(3.37) 

(3.38) 


Simple  algebra  further  shows  that  E  [AB]  —  E  [A]  E  [B]  =  0(t),  and  thus 

jim  var  —  var  =  0.  (3.39) 

Since  lim^^oovar  exists,  the  two  variances  become  equal  when  <5  — )■  cxd. 


2.  Pushing  the  limit  operator  inside  the  expectations  and  using  (3.30)  yields 

1 


lim  var  [Pa{6))  =  var  (  Rv 

(5— >-oo 
1 


E 

(vfRvY 

-  E^ 

I 

< 

c«  tj, 

7’ 

_ 1 

_ 1 

_ 

(3.40) 


Substituting  R^  from  (3.19)  with  5  =  0  into  (3.40)  yields 

1 


lim  var  iPa{6)]  = 


s^oo  V  '  V  iPm"^ 


E 


s^YY^s)  -  E^  [s-^YY^s]  ,  (3.41) 


where  s  =  vf^Q.  The  hrst  expectation  in  (3.41)  is  evaluated  by  unfolding  the 
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squared  quadratic  form  and  applying  the  integration  by  parts  formula  (2.66), 


E 

(a) 


s^YY^s)^ 


^  [y„K,*  K„y;J 

ij,k,p,q,r 

a(n*,y„y;,)' 


ay 


ij,k,p,q,r 

=  y~^  s*SfcS*Sr.AjE  ^5kiYpgY*g  +  Y^jy),g5jr.(5jq] 

ij,k,p,q^r 

^  ^  i^^p^ki^pr  “1“  ^p^kp^jq^ir^j 

i,j,k,p,q,r 

=  (s^Ds)^  (n^  +  n), 


(3.42) 


where  =  is  obtained  by  applying  (2.66)  with  /(Y)  =  Y^jYpgY*^,  while  =  is 
obtained  in  a  similar  way  with  /(Y)  being  for  the  hrst  term  and  Y^*  for  the 
second  term  in  the  summand.  Substituting  (3.42)  into  (3.41)  and  noting  that 
E  [s-^YY^s]  =  ns^Ds  and  s^Ds  =  v^Rv*,  the  proof  is  completed. 

□ 


Note  that  from  the  last  part  of  the  preceding  lemma,  we  conclude  that  for  Gaussian 
input  process,  if  the  number  of  senors  and  observations  are  of  the  same  order,  as 
formalized  in  assumption  1.,  the  variance  of  both  estimators  is  0{m~^)  (recall  that 
under  assumption  4,  ||vs||2  =  0{m)). 


3.5  Asymptotic  Behavior  of  Power  Estimators 

This  section  characterizes  the  asymptotic  behavior  of  power  estimators  Pa  and  Pf,  for 
a  hxed  diagonal  loading  S.  The  theoretical  result  characterizing  how  power  estimators 
and  their  expectations  behave  in  the  limit  when  m,  n  — )■  cx)  such  that  m/n  — )■  c  G 
(0,  CX))  are  presented  in  the  first  part.  Then,  an  asymptotically  unbiased  spatial  power 
estimator  is  proposed.  Finally,  in  the  last  part,  the  obtained  results  are  validated  via 
simulations.  Throughout  the  section,  we  do  not  explicitly  show  the  dependence  on 
diagonal  loading  6  in  the  notation  for  power  estimators,  i.e.,  with  slight  abuse  of 
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notation  we  write  Pai'Vs)  and  Pfe(vs). 


3.5.1  Asymptotic  Analysis  of  Power  Estimators 

The  asymptotic  analysis  of  power  estimators  Pa  and  Pb  is  performed  in  the  limit 
when  both  the  number  of  snapshots  n  and  the  number  of  sensors  m  grow  large  at 
the  same  rate  so  that  m/n  ^  c,  c  7^  0.  The  assumption  that  m  and  n  grow  at  the 
same  rate  captures  the  fact  that  we  often  do  not  have  enough  snapshots  to  estimate 
accurately  the  input  correlation.  On  the  other  hand,  the  assumption  that  m,n  ^  00 
does  not  have  practical  justihcation,  but  is  simply  a  mathematical  necessity  which 
enables  analytical  derivations. 

The  following  lemma  characterizes  the  asymptotic  behavior  of  power  estimators 
Pa  and  Pb. 


Lemma  3.4.  Under  assumptions  2  and  4,  in  the  limit  when  m,n  ^  00  at  the  same 
rate,  i.e.,  ^  ^  c  G  (0,  cxd), 

Pa(vO  -)■ - -  a.s.  (3.43) 

Vl 

and 

h(ys)  i  as.,  (3.44) 

Vl 

where  Qi  and  Q2  are  the  quantities  that  the  quadratic  forms  Qi  and  Q2  converge  to 
in  the  limit.  They  are  given  by 


Qi  =  V 


m 

k= 


^  Afc  (1  —  c  +  c5Mi^  +  5 


(3.45) 


and 


Q2 


J^qfcqf 

k=l 


\kC  (Ml  -  5M2)  +  1 
Afc  (1  —  c  +  c6Mij  +  S 


(3.46) 


where  Mi  and  M2  are  the  limiting  moments  corresponding  to  diagonally  loaded  SCM 
of  the  rectangulary  windowed  snapshots. 
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Proof.  The  convergence  results  (3.43)  and  (3.44)  directly  follow  from  the  convergence 
of  quadratic  forms  Qi  and  Q2,  and  (3.14)  and  (3.15).  The  convergence  of  Qi  and  Q2 
is  proved  using  Theorem  2.2. 

The  empirical  Eigenvector  Stieltjes  Transforms  corresponding  to  SCM  R  and  di¬ 
agonally  loaded  SCM  are  dehned  with  respect  to  the  signal  replica  vector  (he., 
Si  =  S2  =  v^), 

FrW  =  v"(r-^i)'‘v.  (3.47) 

=  v"(Rj-^i)""v..  (3.48) 

These  transforms  are  using  (3.6)  and  (2.11)  related  as 

FaM)  ^  Ft,{z  -  6).  (3.49) 


The  existence  and  characterization  of  the  limiting  Eigenvector  Stieltjes  Transform 
corresponding  to  a  random  matrix  model  (3.16),  used  to  describe  the  SCM  R,  is 
stated  in  Theorem  2.2.  With  the  assumptions  that  the  spectral  norm  of  the  ensemble 
correlation  matrix  R  and  norm  of  the  signal  replica  vector  are  uniformly  bounded 
(assumptions  2  and  4),  the  conditions  of  Theorem  2.2  are  met.  Therefore, 

Fr{z)  Ffiiz)  a.s.,  (3.50) 


where 


vfq^qfv. 


^  (1  -  c  -  c^^k(2:))  -  ^  ■ 


(3.51) 


Recall  that  5'^(2:)  is  the  limiting  Stieltjes  transform  corresponding  to  the  SCM  R. 

The  existence  of  the  limiting  eigenvector  Stieltjes  transform  directly  follows  from  (3.49) 
and  (3.50).  It  is  characterized  using  (3.49)  by 


(3.52) 
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The  quadratic  form  Qi  is  related  to  the  empirical  eigenvector  Stieltjes  transform 
corresponding  to  diagonally  loaded  SCM  as 

Qi=  (3.53) 

Recall  that  z  ^  0~  compactly  represents  =  0  and  — )■  0“.  Note  that  this 

limit  exists  because  either  5  >  0  or  n  >  m  with  zero  probability  of  receiving  two 
snapshots  that  are  identical  up  to  a  scaling. 

Hence,  we  conclude  from  (3.50)  and  (3.53)  that  the  quadratic  form  Qi  almost 
surely  converges  to  non-random  Qi,  given  by 

Qi  =  lim  F^{z  -  (5). 

Finally,  taking  the  above  limit  and  recalling  that  the  limiting  moment 

Ml  =  lim  —  5),  (3.54) 

z->-o- 


we  obtain  (3.45). 

The  quadratic  form  Q2  can  be  expressed  in  terms  of  the  first  derivative  of  the 
empirical  Stieltjes  transform  corresponding  to  diagonally  loaded  SCM  (provided  it  is 
analytic)  as 

(3-55) 

As  already  pointed  pointed  out,  this  limit  exists  because  either  <5  >  0  or  n  >  m  and 
the  probability  of  receiving  two  snapshots  identical  up  to  a  scaling  is  zero.  Therefore, 
from  (3.50)  and  (3.55),  the  quadratic  form  Q2  almost  surely  converges  to  non-random 
Q2,  given  by 

Q2  =  lim  -  5). 

z^o-  dz 

Taking  the  above  limit  and  recalling  (3.54)  and 

1  d 

^2  =  2  -  S),  (3.56) 
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yields  (3.46). 


□ 


The  limiting  moments  Mi  and  M2  in  (3.45)  and  (3.46)  are  for  diagonally  loaded 
SCM  evaluated  from  rectangularly  windowed  snapshots  (observations)  character¬ 
ized  in  Section  2.5.2  and  given  as  the  solutions  to  the  hxed  point  equations  (2.51) 
and  (2.52). 

In  comparison  to  the  approach  used  here,  note  that  [35]  characterizes  Qi  as  a 
part  of  the  study  of  the  SINK  at  the  MPDR  output.  The  deterministic  equivalent  Qi 
obtained  therein  is  given  in  terms  of  the  unique  solution  Bq  (in  a  certain  set)  to 


1  Aj  (1  -|-  cBq) 
m  ^  Aj  -h  5  (1  +  cBq) 


(3.57) 


Also,  the  study  in  [35]  uses  the  true  correlation  R  rather  than  R  in  the  dehnition  of 
the  power  estimator  (3.8).  We  believe  that  the  estimator  as  dehned  in  (3.8)  is  better 
suited  for  the  spatial  spectrum  estimation. 


3.5.2  Asymptotically  Unbiased  Spatial  Power  Estimator 


The  convergence  result  established  in  the  previous  part  is  used  here  to  develop  an 
asymptotically  unbiased  power  estimator  with  respect  to  the  alternatively  dehned 
true  power  (3.28). 

First,  assuming  that  the  number  of  snapshots  n  is  greater  than  the  number  of 
sensors  m,  recall  that  the  power  estimators  Pa  and  A  are  equal  for  diagonal  loading 
(5  =  0.  Substituting  (5  =  0  in  (3.44)  yields  that  in  the  limit  m,  n  — )■  cx)  such  that 
n  >  m  and  ^  c  G  (0, 1) 


1  -c 


vfR-W, 


a.s. 


(3.58) 


Recalling  that  the  estimated  spatial  power  for  known  correlation  matrix  R  is  given 
by  l/vfR-W„  we  conclude  that  the  unloaded  power  estimator  is  asymptotically 
biased  with  respect  to  the  true  power  evaluated  using  (3.28). 
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The  asymptotically  unbiased  power  estimator  P'^^iys)  is  then  obtained  from  (3.58) 
by  dividing  the  unloaded  estimator  Pfe(vs)  with  1  —  c  and  is  given  by 


1  1 
1  -CvfRv,’ 


(3.59) 


where  m  is  the  nnmber  of  sensors,  n  is  the  number  of  observations  and  c  =  m/n. 

Note  that  while  division  by  (1  —  c)  in  (3.59)  produces  an  asymptotically  nnbiased 
estimator,  the  variance  of  the  obtained  estimator  increases  1/(1  —  c)^  times.  This  has 
detrimental  effect  on  the  quality  of  the  estimation  if  the  nnmber  of  observations  n 
is  only  slightly  higher  than  the  nnmber  of  sensors  m.  For  example,  if  c  =  0.9,  the 
variance  increases  100  times! 


3.5.3  Approximate  Expectation  of  Power  Estimators 


As  has  been  shown,  both  power  estimators  almost  snrely  converge  to  corresponding 
deterministic  qnantities  in  the  limit  when  m,  n  — )■  cxd  at  the  same  rate  such  that 
m/n  — )■  c  G  (0,cxd).  According  to  the  dominated  convergence  theorem  [7],  if  power 
estimators  Pa  and  A  are  uniformly  bonnded  as  m,n  —)■  cxd,  their  expectations  con¬ 
verge  to  the  same  deterministic  quantities.  Indeed,  nnder  assnmptions  2,  3  and  4,  Pa 
is  uniformly  upper  bounded  for  all  m  and  n  =  n{m).  Similarly,  power  estimator  is 
nnder  same  assnmptions  nniformly  bonnded  for  all  m  and  n  =  n{m)  ,  provided  that 
diagonal  loading  is  hnite.  Therefore,  the  power  estimators  converge  in  expectation. 


E 


E 


-)■ 

-)■ 


Qi  ~  ^Q2 

Q! 

1 

Qi 


(3.60) 

(3.61) 


Although  the  established  convergence  results  hold  when  m,  n  — )■  cx)  at  the  same 
rate  such  that  m/n  — )■  c,  due  to  rapid  convergence  the  asymptotic  expressions  accn- 
rately  approximate  the  expectations  of  power  estimators  for  relatively  small  n  and 


m.  Therefore,  for  finite  n  and  m, 


E 


E 


Qi  —  SQ2 

Ql 

1 


(3.62) 

(3.63) 


where  Qi  and  Q2  are  evaluated  using  (3.45)  and  (3.46)  with  given  m,  n  and  c  =  m/n. 
These  approximations  are  validated  via  simulations  in  the  following  part. 


3.5.4  Numerical  Validation  of  Derived  Expressions 

Approximations  (3.62)  and  (3.63)  are  validated  using  Monte-Carlo  simulations.  We 
consider  a  standard,  vertical,  linear  array  with  half-wavelength  separation  between 
elements.  The  MPDR  algorithm  is  used  to  estimate  directions  of  arrival  and  the 
corresponding  powers.  Spatially  uncorrelated,  zero-mean  noise  with  a  variance  of  one 
corrupts  the  signal  snapshots.  We  point  out  that  the  derived  characterizations  hold 
for  more  general  arrival  process,  ambient  noise  and  array  shapes. 

The  simulations  and  analytical  expressions  are  compared  on  the  expected  esti¬ 
mated  power  versus  angle  of  arrival  plots  for  hxed  diagonal  loading  value,  as  well 
as  on  the  expected  estimated  power  versus  diagonal  loading  plots  for  hxed  steering 
angle.  Two  different  scenarios  are  considered. 

Scenario  1 

In  the  hrst  scenario,  2  signals  are  arriving  at  elevation  angles  of  90°  and  92°  and  each 
has  power  10.  The  array  has  30  sensors  and  50  snapshots  are  used  to  estimate  the 
sample  correlation  matrix.  Note  that  the  corresponding  c  =  0.6  and  the  normalized 
trace  of  the  ensemble  correlation  matrix  is  21. 

The  comparison  between  the  mean  of  the  estimate  of  the  input  power  evaluated 
via  simulations  using  (3.8)  and  the  corresponding  theoretical  prediction  (3.62)  for  the 
diagonal  loading  value  of  0.3  is  shown  in  the  top  part  of  Fig.  3-2.  A  similar  agreement 
between  the  mean  of  power  estimate  (3.9)  evaluated  via  simulations  and  theoretical 
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estimator  (a) 


Figure  3-2:  Scenario  1:  Expected  power  versus  diagonal  loading  for  steering  angle  of 
87°.  The  normalized  trace  of  the  corresponding  ensemble  correlation  matrix  is  21. 


prediction  (3.63)  for  the  same  value  of  diagonal  loading  0.3  is  obtained  in  the  bottom 
part  of  Fig.  3-2.  Note  that  the  diagonal  loading  of  0.3  is  1.43%  of  the  normalized 
trace  of  the  ensemble  correlation  matrix. 


The  comparisons  between  the  simulated  means  of  power  estimators  (3.8)  and  (3.9) 
and  theoretical  predictions  (3.62)  and  (3.63)  for  a  steering  angle  of  87°  are  shown  in 
Fig.  3-3.  A  good  agreement  between  the  plots  validate  the  accuracy  of  the  asymptotic 
results  in  predicting  the  expected  values  of  the  power  estimators  for  hnite  values  of 
m  and  n. 
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estimator  (a) 


estimator  (b) 


Figure  3-3:  Scenario  1:  Expected  power  versus  steering  angle  for  diagonal  loading  of 
0.3.  This  value  of  diagonal  loading  is  1.43%  of  the  normalized  trace  of  the  ensemble 
correlation  matrix. 


89 


Scenario  2 


In  the  second  scenario,  2  signals  are  arriving  at  elevation  angles  of  90"  and  94°  and 
their  SNR’s  are  1  dB  and  5  dB,  respectively.  The  array  has  40  sensors  and  25 
snapshots  are  used  to  estimate  the  sample  correlation  matrix.  Note  that  the  number 
of  snapshots  per  sensor  in  this  scenario  is  smaller  than  1,  i.e.,  c  =  1.6  and  the 
normalized  trace  of  the  ensemble  correlation  matrix  is  5.42. 

The  comparisons  between  the  expected  values  of  Pa  and  A,  obtained  from  Monte- 
Carlo  simulations,  and  the  corresponding  theoretical  predictions  (3.62)  and  (3.63)  for 
diagonal  loading  of  20  and  varying  steering  direction  and  shown  in  Fig.  3-5.  These 
comparisons  are  for  steering  direction  of  87°  and  varying  diagonal  loading  shown  in 
Fig.  3-4.  The  presented  plots  validate  the  accuracy  of  the  theoretical  predictions. 
Note  that  the  normalized  trace  of  the  corresponding  ensemble  correlation  matrix  is 

3.6  Mean  Square  Error  of  Power  Estimators 

As  discussed  in  the  Section  3.5.2,  the  asymptotically  unbiased  diagonally  unloaded 
estimator  might  have  large  variance  when  the  number  of  snapshots  n  is  only  slightly 
greater  than  the  number  of  sensors  m.  This  motivates  the  study  of  variance  and 
estimation  mean  square  error.  The  variance  and  estimation  MSE  corresponding  to 
power  estimator  Pa  are  evaluated  in  this  section. 

The  received  snapshots  are  assumed  to  be  Gaussian  distributed  throughout  this 
section.  Using  the  Gaussian  tools  from  Section  2.6,  we  evaluate  the  mean  of  the  power 
estimator  A  in  the  large  m,  n  limit.  Although  the  asymptotic  analysis  of  the  means  of 
power  estimators  has  been  conduced  in  the  previous  section,  by  restricting  the  process 
to  be  Gaussian  distributed,  we  are  able  to  characterize  the  order  of  convergence  of 
the  expectations  of  power  estimators  to  their  limiting  values. 

Furthermore,  by  imposing  the  Gaussian  assumption  on  the  received  snapshots, 
we  prove  that  the  deviation  of  the  power  estimator  A  from  its  mean  converges  in 
distribution,  when  m  and  n  grow  large  at  the  same  rate,  to  the  Gaussian  random 
variable.  The  variance  of  the  Gaussian  distribution  is  evaluated  and  it  relatively 
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estimator  (a) 


Figure  3-4:  Scenario  2:  Expected  power  versus  diagonal  loading  for  steering  angle  of 
87°.  The  normalized  trace  of  the  ensemble  correlation  matrix  is  5.42. 
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estimator  (a) 


estimator  (b) 


Figure  3-5:  Scenario  2:  Expected  power  versus  steering  angle  for  diagonal  loading  of 
20.  The  normalized  trace  of  the  ensemble  correlation  matrix  is  5.42. 
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accurately  approximates  the  variance  of  the  power  estimator  Pa  for  hnite  m  and  n. 

The  method  presented  in  this  section  closely  follows  the  Gaussian  Method  in¬ 
troduced  in  [25],  wherein  the  limiting  behavior  of  the  mutual  information  of  MIMO 
channels  is  studied.  This  section  outlines  the  steps  used  in  characterizing  the  MSE 
performance  of  power  estimator  The  derived  characterization  is  validated  via 
Monte-Carlo  simulations  and  these  results  are  presented  in  the  second  part. 


3.6.1  Major  Steps  in  the  Derivation  of  Estimation  MSE 

In  short,  the  following  analysis  studies  how  the  estimator  Pa  behaves  in  the  limit 
when  m,  n  — )■  cxD  such  that  m/n  — )■  c  G  (0,  cxd).  In  particular. 


•  We  hrst  evaluate  the  mean  values  of  the  quadratic  forms  Qi  and  (^2,  denoted 
respectively  by  pi  and  p2-  To  keep  the  notation  simple,  the  explicit  dependence 
of  the  quadratic  forms  on  diagonal  loading  5  is  suppressed. 


•  Then,  it  is  shown  that  the  deviation  of  the  vector  g  = 

iT 


Qi  Q2 


from  its 


mean  g  = 


/il  /i2 


converges  in  distribution,  in  large  m,  n  limit,  to  the 
Gaussian  random  variable.  The  covariance  matrix  S  of  the  limiting  distribution 
is  computed. 


•  Finally,  it  is  obtained  that  the  estimator  Pa  is  approximately  Gaussian  dis¬ 
tributed.  The  mean  value  and  variance  of  the  approximating  distribution  are 
evaluated. 


Evaluation  of  pi:  Step  1 

This  and  the  subsequent  part  analyze  behavior  of  the  hrst  moment  of  the  quadratic 
form  Qi.  Their  purpose  is  to  show  how  the  Gaussian  tools  work  in  practice  and  to 
give  a  glimpse  of  how  other  quantities  important  for  characterizing  Pa  are  evaluated. 

As  a  starting  point,  we  introduce  a  resolvent  identity,  directly  obtained  from  the 
dehnition  of  H  in  (3.22) 

H  =  I  -  -HYY^.  (3.64) 

n 
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Since  s  =  is  deterministic, 


E  [ts-^Hs]  =  ts-^E  [H]  s  (3.65) 

It  follows  from  (3.64)  that 

E[hf,,]  =  6D{t-j)  -  ^E  [{HYY^),,]  .  (3.66) 

The  second  term  in  (3.66)  is  given  by 

m  n 

E  [(HYY«),j]  =^2^^  .  (3.67) 

k=l  1=1 

Using  the  integration  by  parts  formula  (2.66)  with  /(Y)  =  HikY*^,  the  summand 
in  (3.67)  is  expressed  as 

,  ,  'diHikY*,) 

E  [H^kYkiY;]  =  AfcE  .  (3.68) 

The  derivative  of  the  resolvent  H  is  for  (any)  indices  i,j,  k  and  I  given  by  [25] 

^  =  -l(HY)aH,,.  (3.69) 

Substituting  (3.69)  into  (3.68)  yields 

E  [H^uYuiY;]  =  AfcE  H,,5d{3  -  k)  -  ^{HY)uHkkY*i  ,  (3.70) 

Now,  summing  (3.70)  over  k  yields 

E  [{HY)uY*i]  =  E  X,H,j  -  ^tr{DH}(77Y),zY^)  .  (3.71) 

Introducing  (3  =  Atr{DH},  expressing  it  as  /d  =  a  +  P  where  a  =  E  [/3]  and  E  m  =  0, 


94 


and  substituting  into  (3.71)  yields 


(1  +  ta)  E  [{HY)uY;]  =  E  -  tp{HY)aY; 


(3.72) 


Summing  (3.72)  over  I  leads  to 


(1  +  ta)E  [{HYY^)ij]  =  nXjE  [Hij]  -  tE  P{HYY 


(3.73) 


Dividing  (3.73)  with  l+to;,  substituting  the  resulting  expression  into  (3.66)  and  after 
simple  algebraic  manipulations  we  obtain 


E  [Hij]  =  gj5D{i,j)  + 


E 

n(l  +  ta) 


Khyy%, 


(3.74) 


where  gj  =  Note  that  gj's  are  uniformly  bounded  because  the  eigenvalues 

Aj’s  are  upper  bounded  (according  to  assumption  2).  Finally,  substituting  (3.74) 
into  (3.65)  yields 


E  [Qi]  =  ts^Gs  + 


n(l  +  ta) 


E 


/3s^HYY^Gs 


(3.75) 


where  G  =  diagl^fi,  •  •  • ,  fi'm}-  Using  a  similar  reasoning  as  in  [25],  it  can  be  proven 
that 

s^Gs  =  s-^Ts  +  O  ,  (3.76) 


where  T 
equation 


(l  +  tyD)-^ 


and  y  is  a  unique  and  positive  solution  of  the  non-linear 


i  +  Ly^i^ 

n^l+  tXiV 


1 

y’ 


(3.77) 


To  complete  the  evaluation  of  pi,  it  remains  to  characterize  the  second  term 
in  (3.75).  In  fact,  the  following  part  shows  that  this  term  decays  as  1/n.  In  doing  so, 
the  Poincare-Nesh  inequality  (2.67)  is  exploited. 
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Evaluation  of  fii'.  Step  2 


The  second  term  in  (3.75),  denoted  by  Qi,  is  after  introducing 

$  =  ^s^HYY^Gs 


(3.78) 


compactly  written  as 


Qi  =  ni7E 


(3^ 


=  ni7E 


(3.79) 


where  <1  =  $  —  E[<h]  and  K 


=  0(1).  From  the  Cauchy-Schwartz  inequality, 


Qi  <  nK^/  var  (/3)vWM- 


(3.80) 


It  has  been  shown  in  [25]  that  var(/3)  =  0{n  ^).  In  the  following,  we  prove  that 
var  (<F)  =  0(n“^). 

Applying  the  Poincare-Nesh  inequality  (2.67)  on  functional  <h  yields 


var  (<l>)  < 

1=1  j  =  l 


<9$ 


9$ 

2" 

dY* 

(3.81) 


Unwrapping  <F,  taking  its  derivatives  with  respect  to  Yij  and  Y*j  and  summing 
back  yields 

<9$  1  (9$  1 

where 

4.,  =  (s"H),  (Y«Gs) , ,  4.  =  (s^H),  (Y"HYGs)^  , 


#3  =  (Gs).  (s"HY)  $4  =  --  (s"HY)  (HYY"Gs)_ . 

Using  the  inequality  (a  +  &)^  <  2(a^  +  6^),  and  substituting  (3.82)  into  (3.81)  yields 


in  iL  ^  •. 

var(<l>)  <  [i$i|2+i$2r+i$3r+i$4p]  (3.83) 

i=i  j=i  ^ 
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Now,  we  need  to  upper  bound  each  expectation  in  (3.83).  Here,  we  evaluate  an 
upper  bound  of  the  term  involving  <h2-  Namely, 


1=1  j=i 

2t^ 


=  — E  [s^GYY^HDHYY^Gss^HYY^Hsl 

^6  L  J 


(“)  2t^ 
< 


(P)  2^ 

<  —K^ 

n® 


iGir  iiHriiDi 


H\‘^ 


YY^) 


YY 


H\ 


tr  (YY^)  h/tr  (YY^) 


(<=)  K 
<  — 


\ 


E 


hr  (XE 

n  \  \  n 


\ 


E 


(J  ^  /  1 


hr  (XE 

n  \  \  n 


(3.84) 


(a)  (c)  (b) 

where  <  and  <  follow  from  the  Cauchy-Schwartz  inequality.  Inequality  <  follows 

from  the  dehnition  of  the  matrix  norm  and  facts  that  each  of  s,  D,  H  and  G  are 

uniformly  bounded  in  norm  because  respectively,  s  is  a  unitary  rotation  of  the  array 

manifold  vector,  the  largest  eigenvalue  of  R  is  uniformly  bounded  by  assumption  2., 

and  by  dehnition,  ||H||  <  1  and  ||G||  <  Gmax  <  oo.  Finally,  =  follows  from  the 

fc\l 

=  0(1)  for  any  integer  k  [25].  The  similar  reasoning  is 


fact  that  E 


applied  for  other  terms  in  (3.82). 


1  tr  /  ( 


Finally  from  (3.80)  and  (3.84),  we  conclude  that  Qi  =  0{n  ^).  Substituting  this 
result  and  (3.76)  into  (3.75)  yields 


E[gi]  =ts"Ts +  0(n 


(3.85) 


The  method  presented  in  this  and  the  previous  part  is  general  in  the  sense  that 
it  can  be  used  to  characterize  hrst  moments  of  different  functionals  of  the  resolvent 
matrix  H. 
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Statistics  of  the  Vector  g 


This  part  outlines  a  method  used  to  prove  that  the  deviation  of  g  from  its  mean  g  is 
Gaussian  distribnted  in  the  limit  when  m,n  ^  oo  such  that  m/n  — )■  c  G  (0,  cxd).  The 
covariance  S  of  the  limiting  distribntion  is  given  by 


S 


0-12 

(T12 


(3.86) 


where  and  a  12  are  respectively  the  variances  and  covariance  of/between  Qi  and 

Q2- 


We  introdnce  a  new  random  variable  Q  as  a  linear  combination  of  the  centered 
Qi  and  (^2,  namely 

Q  —  ^{Qi  ~  hi)  +  B{Q2  —  ^12)1  (3.87) 

where  /ii  and  ^2  are  the  expectations  of  the  limiting  distribntions  of  Qi  and  Q2-  Note 
that  /ii  has  been  evaluated  in  the  previous  part.  The  variance  of  Q  is  given  by 

aj  =  +  B'^al  +  2AB(Ji2.  (3.88) 

Note  that  the  variances  and  covariance  of/between  Qi  and  Q2  can  be  identihed  from 
(Jq  by  inspection. 

Fnrther,  we  denote  a  characteristic  function  of  Q  by  T  and  resort  to  the  fact  that 
if  in  the  limit 

2  2 
U)  Oq 

^  =  E  ^  e — (3.89) 

then,  in  distribntion, 

aQ^{Q-Q)  ^AA(0,  1).  (3.90) 

In  other  words,  if  the  characteristic  fnnction  of  Q  converges  to  that  of  a  Gaussian 
random  variable,  then  Q  itself  is  Gaussian  distributed  in  the  limit.  The  conver- 
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gence  (3.89)  is  established  by  showing  that  (details  in  [25]) 


—  -  -cao-Q'I'  +  e,  (3.91) 

where  e  converges  to  zero  in  the  limit.  In  addition,  ctq  is  recovered  from  (3.91). 

To  establish  (3.91),  we  start  by  taking  the  hrst  derivative  of  T 

—  =  jAE  [gie^'-Q]  +  jBE  -  jT  {Afi,  +  •  (3.92) 

In  the  further  development,  we  evaluate  E  for  k  =  1  and  k  =  2.  This 

is  essentially  done  by  using  the  integration  by  parts  formula  (2.66)  and  Poincare- 
Nash  inequality  (2.67).  After  a  long  interplay  between  these  two  Gaussian  tools, 
Cauchy-Schwartz  inequality  and  messy  algebra,  we  obtain  (3.91)  and  extract  ctq. 
The  components  of  aq  are  evaluated  using  (3.88)  as 

[s^HDHs]  E  [s^HYY^Gs] 

=  ^{E  [s-^HDH^s]  E  [s^HYY^G(I  +  H)s] 

+E  [s^HDHs]  E  [s-^HYY^G(I  +  H)s]  } 
ai2  =  ^{E  [s^HDH^s]  E  [s^HYY^Gs] 

+E  [s^HDHs]  E  [s-^H^YY^Gs] 

+E  [s^HDHs]  E  [s'^HYY^G(I  +  H)s]  } 


The  expectations  in  the  above  expressions  are  evaluated  using  the  Gaussian  tools. 
The  resulting  expressions  admit  closed  forms  in  terms  of  y  and  traces  and  quadratic 
forms  involving  diagonal  matrices  D  and  T.  The  only  difficulty  is  to  solve  a  non-linear 
equation  (3.77)  for  y,  while  the  other  quantities  are  easily  calculated.  We  omit  the 
presentation  of  the  hnal  expressions. 

Having  established  (3.91),  we  conclude  that  when  m, n  — )■  cxd  such  that  m/n  — )■ 
c  G  (0,  cxd),  Q  has  Gaussian  distribution  with  mean  Q  =  Ayi  +  Bfi2  and  variance  aq. 
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Similarly,  the  vector  Q  is  also  Gaussian  distributed  in  the  limit,  i.e., 

S-i(g-g)^AA(0,  I),  (3.93) 

r  1 

where  g  =  fJ-i  fJ.2  expectation  of  the  limiting  Gaussian.  The  covariance 

matrix  S  of  the  limiting  distribution  is  evaluated  using  (3.86). 

The  hnal  major  step  in  the  statistical  characterization  of  Pa  is  outlined  in  the 
following  part. 


Statistical  Characterization  of  Pa 

Having  established  (3.93),  we  use  the  Delta  method  [25]  to  conclude  that  in  the  limit 
when  m,  n  — )■  cxD  such  that  m/n  — )■  c  G  (0,  cxd). 


af^iPa-iip^  ^AA(0,  1),  (3.94) 

in  distribution,  where  fxp  and  a'p  are  the  mean  value  and  variance  of  the  limiting 
Gaussian  distribution.  The  mean  value  is  evaluated  as 


_  tpi  —  /i2 

“  tpl  ■ 


(3.95) 


On  the  other  hand,  the  variance  is  evaluated  using 


(Tp^  —  VPa  (/^l,/^2)  S  VPj  (/il,P2)  , 


(3.96) 


where  VPa  (Ati,/U2)  is  a  gradient  of  P,  evaluated  at  the  point  (/ii,/i2),  he., 

VPa  (A'-I,  ^'■2)  =  ^(/il,/i2)  ^(huh2)  •  (3.97) 

Thus,  by  evaluating  the  derivatives  of  Pa  with  respect  to  Qi  and  Q2  at  (/ii,/i2)  and 
making  the  necessary  substitutions  in  (3.95)  and  (3.96),  the  hnal  expressions  for  pp^ 
and  are  obtained. 

a 
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As  a  last  step,  we  exploit  the  fact  that  (3.94)  converges  rapidly  so  that  the  expec¬ 
tation  and  variance  of  Pa  are  for  hnite  m  and  n  approximated  as 


E 


/i and  var  iPa)  ~  o'p  • 


(3.98) 


Finally,  the  MSE  of  Pa  is  evaluated  by  substituting  (3.98)  into  (3.10). 


3.6.2  Numerical  Validation  of  Derived  Expressions 

The  derived  expression  for  the  mean  square  error  of  the  spatial  power  estimator  Pa 
is  validated  using  Monte-Carlo  simulations.  We  consider  a  standard,  vertical,  linear 
array  with  A/2  separation  of  the  elements.  The  received  signal  is  composed  of  plane 
waves.  The  MPDR  algorithm  is  used  to  estimate  their  directions  of  arrival  and  the 
corresponding  powers.  Spatially  uncorrelated,  zero-mean  noise  with  a  variance  of 
one  corrupts  the  signal  snapshots.  The  true  power  is  dehned  in  a  standard  way 
using  (3.27).  Note  that  the  derived  characterization  of  estimation  MSE  holds  for 
more  general  arrival  process,  ambient  noise  and  array  shapes. 

Scenario  1 

In  this  scenario,  2  signals  are  arriving  at  elevation  angles  of  90°  and  92°.  Each  signal 
has  power  10.  The  array  has  30  sensors  and  50  snapshots  are  used  to  estimate  the 
correlation  matrix.  Note  that  c  =  0.6  and  the  normalized  trace  of  the  ensemble 
correlation  matrix  is  21. 

The  comparison  between  the  dependences  of  simulated  and  theoretically  predicted 
MSE’s  on  diagonal  loading  for  hxed  steering  direction  is  shown  in  Fig.  3-6.  The  plots 
in  the  top  part  correspond  to  steering  away  from  the  sources  in  the  direction  of  85°. 
The  comparison  in  the  bottom  part  corresponds  to  steering  halfway  between  the 
sources  in  the  direction  of  91°.  The  true  power  is  evaluated  using  the  standard  ap¬ 
proach  (3.27).  The  shown  comparisons  validate  the  accuracy  of  the  derived  theoretical 
prediction. 

The  simulation  study  has  shown  that  the  derived  prediction  of  the  MSE  perfor- 
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steering  angle  is  85° 


steering  angle  is  91° 


Figure  3-6:  Scenario  1:  MSE  versus  diagonal  loading  when  steering  angle  is  85°  (top 
figure)  and  91°  (bottom  figure).  True  power  uses  the  traditional  definition  as  given 
in  (3.27). 


mance  is  most  sensitive  around  the  values  of  optimal  diagonal  loading  when  steering 
close  to  sources.®  To  further  investigate  this  issue,  the  MSEs  obtained  via  simulations 
and  using  the  theoretical  prediction  are  compared  in  Fig.  3-7.  These  MSE’s  in  each 
steering  direction  are  evaluated  at  the  diagonal  loading  which  minimizes  the  corre¬ 
sponding  simulated  MSE.  As  can  be  observed,  the  largest  deviation  happens  when 
steering  close  to  the  source  directions.  As  an  aside  note,  it  can  be  observed  that  the 
optimized  MSE  is  relatively  large  when  pointing  at  the  source.  Addressing  this  issue 
is  a  possible  direction  in  future  research. 


Scenario  2 

In  this  scenario,  2  signals  are  arriving  at  elevation  angles  of  90°  and  94°  and  their 
SNR’s  are  1  dB  and  5  dB,  respectively.  The  array  has  40  sensors  and  25  snapshots 

^However,  this  is  an  important  case  especially  when  steering  towards  a  weak  source  near  a  strong 
one. 
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elevation  angle 


Figure  3-7:  Scenario  1:  Optimized  MSE  versus  elevation  angle.  True  power  uses  the 
traditional  definition  as  given  in  (3.27). 

are  used  to  estimate  the  correlation  matrix.  Note  that  in  this  case  the  number  of 
sensors  is  larger  than  the  number  of  snapshots,  he.,  c  =  1.6.  Also,  the  normalized 
trace  of  the  ensemble  correlation  matrix  is  5.42. 

The  comparisons  between  dependences  of  the  simulated  and  theoretically  pre¬ 
dicted  MSB’s  on  diagonal  loading  for  particular  values  of  steering  directions  and  true 
power  defined  in  a  standard  way  using  (3.27)  are  shown  in  Fig.  3-8.  The  comparison 
in  the  top  part  corresponds  to  steering  away  from  the  sources  in  the  direction  of  87°, 
while  that  in  the  bottom  part  corresponds  to  steering  halfway  between  the  sources  in 
the  direction  of  92°. 

As  in  previous  scenario,  the  theoretical  prediction  of  the  MSE  performance  is 
most  sensitive  when  the  steering  close  to  sources  and  the  diagonal  loading  is  around 
the  optimal  diagonal  loading.  The  comparison  between  the  optimal  MSB’s  evaluated 
via  simulations  and  by  using  the  theoretical  prediction  is  shown  in  the  top  part  of 
Fig.  3-9.  The  bottom  part  of  Fig.  3-9  compares  the  dependences  of  the  simulated 
and  theoretically  predicted  MSB’s  on  diagonal  loading  when  steering  very  close  to 
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steering  angle  is  87° 


Figure  3-8:  Scenario  2:  MSE  versus  diagonal  loading  when  steering  angle  is  87°  (top 
figure)  and  92°  (bottom  figure).  True  power  uses  the  traditional  definition  as  given 
in  (3.27). 


the  sources  in  direction  of  89.5°.  As  can  be  observed  from  these  comparisons,  the 
theoretical  prediction  of  the  MSE  is  inaccurate  when  steering  close  to  source  directions 
and  the  diagonal  loading  is  around  the  diagonal  loading  which  optimizes  the  MSE 
performance.  Finding  the  causes  for  this  disagreement  and  improving  the  prediction 
is  one  possible  future  direction. 


3.7  Optimization  of  Mean  Square  Error 

Having  established  how  the  squared  bias  and  variance  depend  on  loading  6,  we  turn 
our  attention  to  studying  how  these  quantities  are  interrelated.  This  section  con¬ 
jectures  in  the  first  part  that  the  variance  has  insignificant  impact  on  the  diagonal 
loading  which  optimizes  the  estimation  MSE  performance.  The  numerical  validation 
of  this  result  is  presented  in  the  second  part. 

Throughout  this  section,  we  do  not  explicitly  show  the  dependence  of  the  power 
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Figure  3-9:  Scenario  2:  Optimized  MSE  versus  elevation  angle  (top  figure).  MSE 
versus  diagonal  loading  for  steering  angle  of  89.5°  (bottom  figure).  True  power  uses 
the  traditional  definition  as  given  in  (3.27). 
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estimators  on  steering  direction  v^,  ie.,  with  slight  abuse  of  notation  we  write  P{S). 


3.7.1  Estimation  MSE  versus  Squared  Bias  Optimization 

In  this  part,  we  present  arguments  to  support  our  conjecture  that  var  ^P((5)  j  has 
negligible  impact  on  the  value  of  optimal  loading  (5opt- 

The  following  two  lemmas  establish  bounds  and  orders  of  decay  on  hrst  derivatives 
of  power  estimators  and  on  variances  and  their  hrst  derivatives.  The  proofs  are  given 
in  Appendices  A  and  B. 


Lemma  3.5.  For  a  non-negative  loading  6  and  under  assumptions  2  and  3,  the  fol¬ 
lowing  bounds  hold  almost  surely 

1  dPa  <  am 

2  MM  <  2ib  c  MM 

m  —  dS  —  m  ^ 

where  K,  Ci  and  C2  are  some  positive  eonstants. 

Lemma  3.6.  If  the  reeeived  snapshots  u(i)  are  Gaussian  distributed,  then  under 
assumptions  1-f,  the  following  hold, 

1.  var{^P(^  <  0{m~i)  and  var(j^b^  <  0{m~i) 

dPa\  ^  dPb\ 


2.  vary^j  <  0{m  2)  andvar^^j  <  0{m  2). 


Recall  that  the  previous  lemma  holds  if  we  replace  m  with  n  because  m  and  n 
grow  at  the  same  rate  according  to  assumption  1. 

It  can  be  concluded  that  the  same  bounds  as  established  in  Lemma  3.5  apply 


to  the  expectations  of  power  estimators.  In  short,  E  J  “  0{m~p.  Note  that 
the  orders  of  decay  established  in  Lemma  3.6  are  not  necessarily  tight.  Namely, 
Lemma  3.3  shows  that  if  the  input  is  Gaussian,  6  ^  00  and  m  and  n  are  of  the  same 
order,  the  variance  of  both  estimators  is 

To  conjecture  that  the  variance  has  negligible  impact  on  optimal  loading,  we  hrst 
note  that  outside  some  region  around  a  point  at  which  the  squared  bias  attains  zero. 
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the  variance  is  significantly  smaller  than  the  sqnared  bias.  Namely,  note  that  the 
expected  valnes  of  both  estimators  are  at  5  =  0  and  as  h  — )■  cxd,  the  estimator 

Pa  is  of  0{m~p,  while  is  unbonnded.  On  the  other  hand,  the  variances  of  both 
estimators  are  finite  across  whole  <5  region  and  of  0(m~^). 

Next  we  consider  how  the  slopes  of  the  variance  and  sqnared  bias  behave  with 
respect  to  loading  <5.  Starting  from  the  former,  the  slope  of  the  variance  is  expressed 
as  the  scaled  correlation  between  the  estimator  and  its  slope  and  is  npper  bonnded 
using  the  Cauchy-Schwartz  inequality. 


<9  ,  p 

7r?var  (  F 
od 


dP 


<  2 


\ 


var  ( P  I  var 


dP 


(3.99) 


The  bounds  on  the  variance  of  the  estimator  and  its  derivative  are  given  in  Lemma  3.6 
and  therefore  the  slope  of  the  variance  is  upper  bounded 


d 

06 


var 


<  0{m  2). 


(3.100) 


On  the  other  hand,  the  slope  of  the  squared  bias  is  given  by 


0 

m 


bias^((5) 


r  1 

dP 

2E 

P{6)  -  P 

E 

06 

(3.101) 


Therefore,  the  slope  of  the  squared  bias  achieves  zero  at  h  =  (5opt.  Outside  the  region 
around  a  point  at  which  it  attains  zero,  the  slope  is  of  0{m~^).  The  slope  of  the 
squared  bias  corresponding  to  Pa  is  zero  at  5  =  0  and  as  <5  — )■  cxd.  Overall,  the  slope 
of  the  squared  bias  varies  more  significantly  than  the  slope  of  the  variance. 

To  summarize, 

1.  The  variance  is  significantly  smaller  than  the  squared  bias  outside  some  region 
around  the  point  at  which  the  squared  bias  equals  zero. 


2.  The  slope  of  the  squared  bias  varies  more  significantly  and  is  much  larger  than 
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the  slope  of  the  variance. 


The  preceding  two  arguments  imply  that  the  variance  has  no  signihcant  impact  on 
the  value  of  optimal  loading.  In  other  words,  the  loss  incurred  by  using  5opt  instead  of 
(5opt  is  negligible.  This  loss  is  quantihed  for  spatial  power  estimator  in  the  following 
part. 

3.7.2  Estimation  MSE  Loss  for  Power  Estimator  P5 

Strictly  speaking,  ^opt  is  not  an  optimal  diagonal  loading  which  minimizes  the  MSE. 
However,  according  to  the  arguments  presented  in  the  previous  part,  (5opt  and  hopt  are 
not  far  from  each  other.  To  make  the  argument  more  precise,  we  study  how  much 
the  MSE  is  degraded  by  using  (5opt  as  an  optimal  loading.  A  MSE  loss  incurred  by 
setting  5opt  instead  of  (5opt  is  dehned  as 


L{S^pt,Sopt)  =  MSE(,5opt)  -  MSE(5opt).  (3.102) 


We  show  that  in  the  large  m,  n  regime,  the  MSE  loss  L  becomes  negligible  compared 
to  the  optimal  MSE.  In  doing  so,  we  assume  without  loss  of  generality  that  the  true 
power  P  is  such  that  the  optimal  loading  hopt  is  non-zero  and  hnite. 

The  squared  bias  attains  zero  at  hopt-  It  is  convex  and  approximately  quadratic 
in  a  certain  interval  around  (5opt.  Therefore,  the  squared  bias  is  within  that  interval 
approximated  with  a  second  order  Taylor  polynomial, 

biaB"(iS)  «  (s  -  ■  (3.103) 


Invoke  that  bias^((5opt)  =  0  and  ^bias^(5opt)  =  0.  The  second  derivative  of  the 
squared  bias  at  (5opt  is  obtained  by  differentiating  (3.101)  and  is  given  by 


(9^bias^(5opt) 

dS^ 


2E2 


dPbilpt) 

06 


(3.104) 
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The  squared  bias  is  then  approximated  as 


bias^((5)  ^ 


dPbiLpt) 

d5 


5-5, 


opt 


(3.105) 


On  the  other  hand,  the  variance  is  approximated  with  a  hrst  order  Taylor  polyno¬ 
mial  around  (5opt.  This  is  because  the  variability  in  the  slope  of  the  variance  is  much 
smaller  than  that  of  the  squared  bias.  Hence,  using  (3.99), 


var  Pb{5)  ^  var  Pfe(5opt)  + 


(9var  ( Pb{5 opt) 


=  var  Pb ((5opt )  -h  2  cov  Pb{5 opt ) , 


05 

OPbi^t) 

05 


{5  -  ^opt) 

(<5-5opt).  (3.106) 


Given  that  the  squared  bias  and  variance  are  approximated  respectively  with 
quadratic  and  linear  functions,  the  problem  of  evaluating  the  MSE  loss  is  boiled  down 
to  determining  the  loss  incurred  by  declaring  that  the  minimizer  of  the  sum  of  these 
two  functions  is  the  argument  which  minimizes  the  quadratic  function  (Appendix  C). 
Applying  (C.3),  this  yields 


L{5opti  *^opt) 


C0V2  (A(5opt), 


as 


E2 


dPt(Sopt) 

as 


(3.107) 


The  MSE  loss  (3.107)  is  upper  bounded  by  invoking  that  ^  from  Lemma  3.5 

and  by  utilizing  (3.99)  and  (3.100).  Therefore, 

L(5opt,5opt)  =  0(m~p.  (3.108) 


Finally,  in  the  limit  when  m  and  n  grow  large  at  the  same  rate, 

L(5opt,5opt)<^MSE(5opt).  (3.109) 

he.,  the  MSE  loss  is  negligible  compared  to  the  optimum  MSE  in  the  large  (m,n) 
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regime. 

3.7.3  Numerical  Validation 

The  presented  results  are  validated  via  Monte-Carlo  simulations.  We  consider  dif¬ 
ferent  scenarios  with  respect  to  number  of  arrivals,  their  powers  and  directions  of 
arrival. 

A  standard,  vertical,  linear  array  with  a  half-wavelength  separation  between  the 
elements  is  considered  in  each  scenario.  In  addition,  a  spatially  uncorrelated,  zero- 
mean  noise  with  a  variance  of  one  corrupts  the  signal  snapshots. 

Scenario  1:  Two  Closely  Spaced  Arrivals  with  High  SNR 

In  this  scenario,  2  signals  are  arriving  at  elevation  angles  of  90"  and  92"  and  each  has 
power  10  (i.e.,  the  SNR  is  10  dB).  The  array  contains  30  sensors  and  50  snapshots  are 
used  to  estimate  the  sample  correlation  matrix.  Note  that  c  =  0.6  and  the  normalized 
trace  of  the  ensemble  correlation  matrix  is  21. 

The  simulated  dependence  of  the  squared  bias,  variance  and  MSE  on  diagonal 
loading  is  shown  in  Fig.  3-10  for  the  true  power  defined  in  a  standard  way  (3.27)  and 
in  Fig.  3-11  when  the  true  power  is  dehned  using  an  alternative  approach  (3.28).  A 
steering  angle  is  87°  and  the  plots  show  that  the  variance  has  almost  no  influence  on 
a  diagonal  loading  which  minimizes  the  MSE. 

The  hgures  also  show  that  the  diagonal  loading  which  optimizes  the  MSE  corre¬ 
sponding  to  power  estimator  Ph  is  not  greater  than  the  loading  which  optimizes  the 
MSE  corresponding  to  Pa-  This  is  because  both  power  estimators  are  monotonically 
non-decreasing  functions  of  S,  Pb{S)  >  Pa{S)  and  the  conjecture  that  the  bias  term 
controls  the  optimal  diagonal  loading. 

Finally,  note  that  the  optimal  loading  minimizing  either  estimator  when  the  true 
power  is  dehned  in  a  standard  way  is  larger  than  the  optimal  loading  when  the  power 
is  dehned  in  an  alternative  way.  This  is  due  to  the  combined  ehect  of  the  facts  that  (1) 
both  power  estimators  are  non-decreasing  functions  of  6,  (2)  the  true  power  evaluated 
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with  (3.28)  is  greater  than  the  true  power  evaluated  in  a  standard  way  using  (3.27) 
and  (3)  our  conjecture  that  bias  controls  the  value  of  optimal  diagonal  loading. 

A  similar  set  of  plots  corresponding  to  steering  angle  of  89"  are  given  in  Fig.  3-12 
and  Fig.  3-13.  As  can  be  observed  from  Fig.  3-12,  when  the  true  power  is  dehned  in 
a  standard  way  using  (3.27),  the  optimal  loading  for  both  estimators  is  zero.  This 
happens  because  the  true  power  in  such  direction  (and  in  general  in  directions  suffi¬ 
ciently  close  to  the  directions  of  arrival)  is  below  the  expected  smallest  value  of  either 
power  estimator,  achieved  for  zero  loading.  Therefore,  since  the  power  estimators 
are  non-decreasing  functions  of  loading  S,  an  unloaded  power  estimator  minimizes 
the  bias  and  hence,  due  to  our  conjecture,  the  estimation  MSE.  Intuitively,  as  the 
steering  direction  gets  closer  to  but  is  not  pointed  exactly  at  the  source,  the  optimal 
diagonal  loading  is  reduced  as  the  estimator  needs  to  maintain  more  adaptability  to 
null  the  source. 

On  the  other  hand,  if  the  true  power  is  evaluated  using  (3.28),  the  corresponding 
bias  (and  hence  the  estimation  MSE)  is  minimized  for  non-zero  loading  because  the 
expected  smallest  value  of  either  power  estimator  is  above  the  alternatively  dehned 
true  power  whenever  c  <  1. 

The  optimal  MSE  and  the  MSE  evaluated  at  a  loading  which  minimizes  the 
squared  bias  are  compared  in  Fig.  3-14  for  power  estimator  Pa  and  true  power  evalu¬ 
ated  according  to  (3.27).  The  plots  highlight  two  different  ranges  of  steering  angles. 
As  shown,  the  performance  loss  is  larger  when  steering  away  from  the  main  beams, 
but  remains  within  1  dB  of  the  optimal  MSE.  A  smaller  performance  loss  is  observed 
in  Fig.  3-15,  which  corresponds  to  the  estimator 

The  corresponding  comparisons  for  power  estimators  Pa  and  Pb  and  true  power 
evaluated  using  (3.28)  are  shown  in  Figures  3-16  and  3-17.  Again,  the  largest  per¬ 
formance  loss  is  observed  for  power  estimator  Pa  when  steering  away  from  the  main 
beams  and  it  remains  within  1  dB  of  optimal  MSE. 

Finally,  note  that  the  optimized  estimator  Pb  outperforms  the  optimized  estimator 
Pa-  The  comparison  between  the  power  estimators  is  given  in  Section  3.9. 


Ill 


power  estimator  (a) 


power  estimator  (b) 


Figure  3-10:  Scenario  1:  Squared  bias,  variance  and  MSE  corresponding  to  estimators 
Pa  and  Pb  versus  diagonal  loading  for  steering  angle  of  87°.  True  power  uses  the 
traditional  definition  as  given  in  (3.27). 


power  estimator  (a) 


power  estimator  (b) 


Figure  3-11:  Scenario  1:  Squared  bias,  variance  and  MSE  corresponding  to  estimators 
Pa  and  Pb  versus  diagonal  loading  for  steering  angle  of  87”.  True  power  uses  the 
alternative  dehnition  as  given  in  (3.28). 
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power  estimator  (a) 


Figure  3-12:  Scenario  1:  Squared  Bias,  variance  and  MSE  corresponding  to  estimators 
Pa  and  Pb  versus  diagonal  loading  for  steering  angle  of  89°.  True  power  uses  the 
traditional  definition  as  given  in  (3.27). 


Figure  3-13:  Scenario  1:  Squared  bias,  variance  and  MSE  corresponding  to  estimators 
Pa  and  Pb  versus  diagonal  loading  for  steering  angle  of  89°.  True  power  uses  the 
alternative  dehnition  as  given  in  (3.28). 
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Figure  3-14:  Scenario  1:  MSE  (j^a  j  and  MSE  (j^a  j  versus  steering  an¬ 
gle.  True  power  uses  the  traditional  definition  as  given  in  (3.27). 


Figure  3-15:  Scenario  1:  MSE  j  and  MSE  j  versus  steering  angle. 

True  power  uses  the  traditional  definition  as  given  in  (3.27). 
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Figure  3-16:  Scenario  1:  MSE  (j^a  j  and  MSE  (j^a  j  versus  steering  an¬ 
gle.  True  power  uses  the  alternative  definition  as  given  in  (3.28). 


Figure  3-17:  Scenario  1:  MSE  j  and  MSE  versus  steering  angle. 

True  power  uses  the  alternative  definition  as  given  in  (3.28). 
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Scenario  2:  Separated  Arrivals  with  Low  SNR  in  a  Snapshot  Poor  Regime 


In  this  scenario,  2  signals  are  arriving  at  elevation  angles  of  90"  and  94°  with  respect 
to  the  broadside  of  the  array.  Their  SNR’s  are  respectively  1  dB  and  5  dB.  The  array 
has  40  sensors  and  the  number  of  snapshots  available  to  estimate  the  SCM  is  25.  Note 
that  c  =  1.6  and  the  normalized  trace  of  the  ensemble  correlation  matrix  is  5.42. 

The  simulated  dependences  of  the  squared  bias,  variance  and  MSE  on  diagonal 
loading  for  estimators  Pa  and  A,  true  powers  evaluated  using  (3.27)  and  (3.28)  and 
steering  angle  of  94.5°  are  shown  respectively  in  Figures  3-18  and  3-19.  The  same  set 
of  plots  corresponding  to  steering  angle  of  87°  is  shown  in  Figures  3-20  and  3-21.  The 
plots  conhrm  that  the  variance  has  almost  no  influence  on  a  diagonal  loading  which 
minimizes  the  MSE. 

Note  that  when  steering  close  to  the  source  direction,  the  MSE  performance  is 
optimized  for  non-zero  loading,  as  shown  in  Fig  3-18.  This  is  because  the  parameter 
c  =  m/n  is  above  1  (he.,  the  SCM  is  rank  dehcient)  so  that  as  <5  — )■  0,  the  estimators 
Pa  and  Pb  converge  to  0.  Thus,  as  long  as  the  true  power  is  non-zero,  the  optimal 
diagonal  loading  is  positive. 

As  in  Scenario  1,  a  diagonal  loading  which  optimizes  the  MSE  corresponding  to 
Pb  does  not  exceed  a  loading  which  optimizes  the  MSE  corresponding  to  Pa-  Also,  the 
optimal  loading  pertaining  to  true  power  evaluated  using  (3.27)  is  smaller  than  the 
optimal  loading  pertaining  to  true  power  evaluated  with  (3.28)  for  both  estimators. 

The  optimal  MSE  and  the  MSE  evaluated  at  the  minimizer  (5opt  of  the  squared  bias 
are  compared  for  estimators  Pa  and  when  the  true  power  is  evaluated  with  (3.27) 
in  Figures  3-22  and  3-23.  The  corresponding  comparisons  when  the  true  power  is 
evaluated  with  (3.28)  are  given  in  Figures  3-24  and  3-25. 

As  can  be  observed  from  the  plots,  the  largest  performance  loss  appears  with 
estimator  Pa  when  steering  away  from  the  main  beams.  However,  this  loss  is  still 
within  1  dB.  The  performance  loss  corresponding  to  estimator  Pb  is  negligible.  The 
theoretical  characterization  of  this  loss  is  given  in  Section  3.7.2. 

As  a  hnal  remark,  the  optimized  estimator  Pb  outperforms  the  optimized  estimator 
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power  estimator  (a) 


diagonal  loading 


diagonal  loading 


Figure  3-18:  Scenario  2:  Squared  bias,  variance  and  MSE  corresponding  to  estimators 
Pa  and  Pb  versus  diagonal  loading  for  steering  angle  of  94.5°.  True  power  uses  the 
traditional  definition  as  given  in  (3.27). 


Figure  3-19:  Scenario  2:  Squared  bias,  variance  and  MSE  corresponding  to  estimators 
Pa  and  Pb  versus  diagonal  loading  for  steering  angle  of  94.5°.  True  power  uses  the 
alternative  dehnition  as  given  in  (3.28). 
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Figure  3-20:  Scenario  2:  Squared  Bias,  Variance  and  MSE  corresponding  to  estimators 
Pa  and  Ph  versus  Diagonal  Loading  for  Steering  Angle  of  87°.  True  power  uses  the 
traditional  definition  as  given  in  (3.27). 


Figure  3-21:  Scenario  2:  Squared  Bias,  Variance  and  MSE  corresponding  to  estimators 
Pa  and  Ph  versus  Diagonal  Loading  for  Steering  Angle  of  87°.  True  power  uses  the 
alternative  dehnition  as  given  in  (3.28). 
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Figure  3-22:  Scenario  2:  MSE  (^Pa  ^(5opt))  MSE  j  versus  steering  an¬ 

gle.  True  power  uses  the  traditional  definition  as  given  in  (3.27). 


Pa-  The  comparison  between  the  estimators  is  treated  more  formally  in  Section  3.9. 


3.8  Behavior  of  Optimal  Diagonal  Loading 


The  diagonal  loading  which  minimizes  the  squared  bias  and  approximately  minimizes 
the  estimation  MSE  is  analyzed  in  this  section  and  it  is  shown  that  the  optimal 
diagonal  loading  depends  on  the  steering  direction.  The  analysis  exploits  the  results 
from  Section  3.4  pertaining  to  the  behavior  of  the  squared  bias. 

In  general,  for  a  given  steering  direction  v*,  the  optimal  loading  ^opt  which  mini¬ 
mizes  the  squared  bias  is  for  estimator  Pa 
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=  0,  if 

e  (0,  CX)),  if  E  Pa(0) 


P  <  E 


A(0) 


<  P  <  lim^^oo  E 
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Figure  3-23:  Scenario  2:  MSE  and  MSE  versus  steering  angle. 

True  power  uses  the  traditional  definition  as  given  in  (3.27). 


Figure  3-24:  Scenario  2:  MSE  ^Pa  MSE  ^Pa  ^(5optj)  versus  steering  an¬ 

gle.  True  power  uses  the  alternative  definition  as  given  in  (3.28). 


120 


Figure  3-25:  Scenario  2:  MSE  j  MSE  ^(5opt  j  j  versus  steering  angle. 

True  power  uses  the  alternative  definition  as  given  in  (3.27). 

and  for  estimator  Fh, 


^{b) 

"opt 


=0,  if  P<E 

po) 

G  (0,  oo),  if  P  >  E 

h(0)] 

(3.111) 


where  P  is  the  true  power  in  the  direction  described  by  with  respect  to  which 


the  MSE  of  a  power  estimator  is  optimized.  The  limiting  expectations  E 
lim^^oo  E 


P(0) 


and 


P{S)  are  evaluated  using  the  results  of  Lemma  3.2. 

In  addition,  invoking  that  Pb{S)  >  Pa{S),  we  immediately  conclude  that  for  a  given 
steering  direction  and  the  true  power  P  associated  with  it, 


x{b)  ^  da) 

Copt  ^  "opt. 


(3.112) 


with  equality  for  P  <  E  P(0)  (in  which  case,  these  loadings  are  equal  to  zero). 

In  the  following  parts,  arrival  models  consisting  of  a  single  and  multiple  plane 
waves  embedded  into  spatially  uncorrelated  noise  (he.,  the  noise  is  isotropic  and  the 


121 


Table  3.1:  Expected  Power  Estimators  in  the  Limit 


sensors  are  separated  half-a-wavelength)  are  considered.  The  behavior  of  the  optimal 
diagonal  loading  is  analyzed  with  respect  to  the  standard  dehnition  of  true  power  in 
the  hrst  part  of  this  section.  The  analysis  corresponding  to  the  alternative  dehnition 
of  true  power  is  given  in  the  second  part.  The  last  part  validates  the  obtained  results 
via  simulations. 


3.8.1  Single  Source  in  Uncorrelated  Noise 

An  ensemble  correlation  matrix  R  of  the  received  snapshots  originating  from  a  sin¬ 
gle  source  transmitting  a  signal  of  power  Pq  in  the  direction  described  by  Vo  and 
embedded  into  an  uncorrelated  noise  of  variance  is 

R  =  Povfvo  -h  a^I.  (3.113) 


Using  the  matrix  inversion  lemma,  the  inverse  of  the  spatial  correlation  matrix  can 


be  written  as 


P  V 

al  +  mPo 


(3.114) 


Substituting  (3.113)  and  (3.114)  into  the  expressions  of  Lemma  3.2,  the  results 
concerning  the  limiting  behavior  of  the  power  estimators  for  5  =  0  and  S  ^  oo 
when  steering  in  and  off  the  source  direction  Vo  and  when  n  >  m  are  obtained  and 
summarized  in  Table  3.1.  Recall  that  the  power  estimator  T),  is  unbounded  when 
5  — )■  cxD  and  this  case  is  not  shown  in  the  table.  Note  that  when  m  >  n  and  5  =  0, 
the  SCM  has  zero  eigenvalues  so  that  the  power  estimate  in  the  steering  direction 
becomes  zero  unless  some  sort  of  subspace  processing  is  employed. 
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From  the  results  in  Table  3.1,  true  power  definition  (3.115)  and  characteriza¬ 
tions  (3.110)  and  (3.111),  we  deduce  some  facts  about  optimal  loading  ^opt-  Namely, 
when  steering  in  the  source  direction  such  that  v*  =  Vo,  the  squared  bias  pertaining 
to  Pa  is  minimized  (canceled  out)  when  5  — )■  cx),  he.,  the  optimal  processor  in  this  case 
is  the  matched  filter.  Since  A(<^)  is  unbounded  when  <5  — )■  oo,  the  optimal  loading 
is  always  finite. 

On  the  other  hand,  when  steering  off  the  source  direction  such  that  7^  Vo, 
the  optimal  loading  (5opt  depends  on  the  value  of  the  inner  product  v^Vq.  Namely, 
from  (3.110)  and  (3.111),  the  squared  bias  corresponding  to  both  A  and  A  is  min- 
imized  for  zero  loading  when  P  =  ^  <  E  P(0)  .  This  condition  is  in  the  n  >  m 
regime  using  the  result  from  Table  3.1  equivalent  to 


H  ^  /  9 

^  A  ^  < 

m  ~  n  \  mPo 


(3.116) 


For  the  case  when  and  Vq  are  orthogonal,  then  the  true  power  P  =  ^  coincides 
with  lim^^oo  E  Pa{S)  and  consequently,  (5op|-  =  cxd.  In  other  words,  when  v^Vo  =  0, 
the  matched  filter  completely  removes  the  interference  originating  from  the  source 
meaning  that  it  is  the  optimal  processor.  In  addition,  as  v*  moves  away  from  Vq,  the 
inner  product  Ivf^v^l  tends  to  decrease  such  that  lim^^oo  E  A(^)  approaches  the 
level  of  the  true  power  P.  Hence,  the  optimal  loading  5op|-  tends  to  increase.  The 
behavior  of  the  optimal  loading  corresponding  to  A  exhibits  similar  trends,  with 
the  distinction  that  it  is  always  finite. 

The  preceding  observations  are  summarized  in  Table  3.2.  A  graphical  visualization 
of  the  results  is  given  in  Fig.  3-26,  where  the  true  power  P  and  the  expected  values 
of  the  estimator  P^  when  <5  =  0  and  S  ^  00  for  a  range  of  steering  angles,  are  shown. 
The  specific  scenario  used  to  generate  Fig.  3-26  is  described  in  its  caption.  A  value 
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Table  3.2:  Properties  of  (5opt 


Vs  =  Vo 

Vs  P  Vo 

^opt 

oo 

OO,  if  vf  Vo  =  0 

0,  if  condition  (3.116) 

G  (0,  cxd),  otherwise 

^(b) 

^opt 

hnite 

0,  if  condition  (3.116) 

G  (0,  cxd),  otherwise 

of  E 


Pa{S)  is  always  between  E  P(0) 


and  E 


PJoo) 


because  Pa  monotonically 


increases.  For  a  given  steering  angle  and  if  P  is  above  E 


P(0) 


,  as  6  changes  from 


0  to  cxD,  the  expected  estimate  moves  from  the  E 


curve,  intersects  a  curve 


representing  the  true  power  P  and  asymptotically  approaches  E 


Pnioo) 


curve.  As 


a  rule  of  thumb,  whenever  the  true  power  P  gets  close  to  E 


Pa(oo) 


,  the  optimal 


loading  becomes  large.  When  P  is  below  E  Pa(0)  ,  the  squared  bias  is  minimized 
for  (5  =  0.  A  graphical  representation  corresponding  to  the  estimator  Pf,  is  similar. 


with  the  difference  that  E 


Ph 


blows  up  as  (5  — )■  cx)  such  that  is  always  hnite. 
The  dependance  of  optimal  loading  on  steering  angle  is  shown  in  Fig.  3-27.  As 
expected  from  Fig.  3-26,  the  optimal  loading  oscillates  and  increases  as  the  steering 
angle  moves  away  from  the  source  direction.  The  optimal  loading  is  clipped  at  200. 
In  contrast,  the  optimal  loading  for  the  considered  example  is  15.5  in  the  source 
direction  and  does  not  exceed  the  value  of  0.38  when  steering  off  the  source  direction. 
This  happens  because  is  much  larger  than  Pa  when  d  is  moderately  greater  than 


zero  such  that  E 


Ph 


intersects  the  level  of  P  much  sooner  than  E 


does. 


3.8.2  Multiple  Sources  in  Uncorrelated  Noise 


Now  we  consider  a  case  of  a  number  of  sources  M,  each  transmitting  a  signal  of  power 
Pj  in  the  direction  specified  by  Vj  such  that  the  ensemble  correlation  matrix  of  the 
received  snapshots  is® 

M 


R 


k=l 


V^V, 


(3.117) 


®This  model  is  called  separable  souces  in  [52]. 
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Single  Source  in  Uncorrelated  Noise 


Figure  3-26:  True  power  according  to  (3.115),  E  .Pa(O) 
has  power  10  and  arrives  from  90".  Noise  is  uncorrelatec 
array  has  30  sensors  and  50  snapshots  are  available. 


and  E 


Pnioo) 


and  its  variance 


A  signal 
is  1.  The 


Figure  3-27:  Optimal  loading  ^0“^.  A  signal  is  of  power  10  and  arrives  from  90". 
Noise  is  uncorrelated,  of  power  1.  A  diagonal  loading  is  clipped  at  200.  Whenever 
the  steering  and  source  directions  are  the  same  or  orthogonal,  =  00. 
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where  is  the  variance  of  the  uncorrelated  noise. 

The  limiting  expectation  of  the  estimator  Pa  when  the  steering  direction  7^  Vj, 
i  =  1, . . . ,  M,  is  using  Lemma  3.2  given  by 

M  H  2 

hmE[A(5)l  =  Vp,  ^  +^.  (3.118) 

(5^>oo  L  J  m  m 

k=l 

Therefore,  since  in  general,  lim^^oo  E  Pa(^)  >  P  =  ^,  the  optimal  loading  is 
hnite.  In  a  special  case  when  is  orthogonal  with  each  of  Vj,  i  =  1, . . . ,  M,  the 
optimal  beamformer  is  a  matched  hlter. 

Similarly,  when  steering  in  a  source  direction, 

M  iy  2  2 

lim  E  =  Pi  +  Vp*  ^  (3.119) 

&^oo  L  J  m  m 

k=2 

where  without  loss  of  generality  =  vi.  Again,  in  general,  lim^^oo  E  Pa  (5)  >  P  = 
Pi  +  ^  so  that  the  optimal  loading  is  hnite.  In  a  special  case  when  v^Vj  =  0  for 
i  =  2, . . . ,  M,  the  optimal  estimator  is  a  matched  hlter. 

Certainly,  when  steering  slightly  away  from  the  source  direction,  the  true  power 
2 

P  =  ^  is  below  the  smallest  expected  estimate  achieved  for  5  =  0.  The  squared 
bias  is  minimized  for  5  =  0  in  such  a  case.  A  general  characterization  of  the  steering 
directions  for  which  5^“^  =  0  is  cumbersome  due  to  lack  of  compact  expression  for 
R~L  Therefore,  we  need  to  resort  to  a  specihc  assignment  of  the  arriving  directions 
Vj  and  powers  Pj. 

A  graphical  visualization  of  the  stated  results  is  given  in  Fig.  3-28  where  two 
arrivals  have  the  same  power  and  are  closely  spaced.  A  similar  discussion  as  given  for 
a  single  source  case  applies  here  as  well.  A  plot  of  optimal  loading  5^“^  is  shown  in 
Fig.  3-29.  Again,  when  non-zero,  the  optimal  loading  5^^^  is  much  smaller  than 
and  does  not  exceed  the  value  of  0.38  when  steering  off  the  main  beams. 
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15 


- No  loading 


Pa{00) 


and  the  expected 
ations.  Scenario  1  is  con- 


Fignre  3-28:  Trne  power  according  to  (3.27),  E  Ai(0)  ,  E 
optimized  power  estimate  obtained  via  Monte-Carlo  simn^ 
sidered,  namely,  each  of  two  plane  waves  has  power  10  and  they  arrive  at  90°  and  92° 
to  the  broadside.  The  noise  is  nncorrelated  and  its  variance  is  1.  The  array  is  linear, 
m/2  spaced  and  has  30  sensors.  50  snapshots  are  available. 


Minimizer  of  squared  bias  pertaining  to  estimator  a) 
100  ' 

80 . 

O 

2  60- 


(/} 


steering  angle  in  degrees 


Fignre  3-29:  Optimal  loading  (5op|-  for  Scenario  1. 
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3.8.3  Alternative  Definition  of  True  Power 

In  the  example  of  a  single  source  in  uncorrelated  noise,  it  can  be  shown  that  if  the 
array  is  steered  in  the  source  direction,  then  the  power  P  as  dehned  in  (3.28)  is 

P  =  lim  Pa{5),  (3.120) 

S^OO 

i.e.,  the  optimal  loading  ^0“^  — )■  cxd.  In  comparison,  the  squared  bias  corresponding 
to  Pb  is  always  minimized  for  finite  loading  5®^. 

A  more  interesting  fact  is  to  observe  using  Lemma  3.2  that 

where  c  <  1  is  implicitly  assumed  by  considering  the  unloaded  estimator. 

This  means  that  unless  c  =  0,  when  no  loading  is  needed,  the  optimal  loading  ^opt 
is  positive.  This  is  graphically  shown  in  Fig.  3-30,  which  corresponds  to  the  same 
scenario  as  in  Fig.  3-28.  Since  the  true  power  is  always  greater  than  E  Pa(0)  ,  the 
optimal  loading  ^0“!-  is  always  non-zero. 

Note  that  when  steering  away  from  the  source  direction,  the  powers  dehned 
in  (3.27)  and  (3.28)  converge  such  that  the  associated  optimal  loadings  coincide.  On 
the  contrary,  when  steering  close  to  the  source  direction,  diagonal  loadings  optimized 
with  respect  to  two  dehnitions  signihcantly  differ. 

3.8.4  Numerical  Validation 

This  section  validates  the  results  which  describe  behavior  of  optimal  diagonal  loading 
when  the  arrival  process  is  composed  of  plane  waves.  A  standard,  vertical,  linear 
array  with  u/2  separation  of  the  elements  is  considered.  In  addition,  a  spatially 
uncorrelated,  zero-mean  noise  with  a  variance  of  one  corrupts  the  signal  snapshots. 
Two  different  scenarios  are  considered. 
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Two  Sources  in  Uncorrelated  Noise 


1 


for  Scenario 


Scenario  1 

In  this  scenario,  2  signals  are  arriving  at  elevation  angles  of  90°  and  92°  and  each  has 
power  10  (he.,  the  SNR  is  10  dB).  The  array  contains  30  sensors  and  50  snapshots 
are  used  to  estimate  the  sample  correlation  matrix. 

To  validate  the  theoretical  hndings  in  this  section,  the  simulated  optimal  diagonal 
loadings  versus  steering  angle  are  plotted  in  Fig.  3-31  for  true  power  evaluated  using 
a  standard  approach  (3.27).  As  shown,  there  is  a  range  of  steering  directions  for 
which  a  nearly  diagonally  unloaded  estimator  achieves  the  lowest  MSE.  The  smallest 
diagonal  loading  used  in  the  simulation  tests  is  10“^  and  is  optimal  when  steering 
close  to  main  beams.  In  addition,  the  estimator  Pb  is  optimized  at  smaller  values  of 
diagonal  loading  which  do  not  vary  signihcantly  across  the  steering  directions. 

The  plots  of  simulated  optimal  diagonal  loadings  versus  steering  angle  for  true 
power  evaluated  using  the  alternative  approach  (3.28)  are  shown  in  Fig.  3-32.  As 
theoretically  elaborated  in  previous  parts,  when  an  alternative  dehnition  of  true  power 
is  used,  the  performance  is  optimized  for  positive  loadings. 
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estimator  (a) 


Figure  3-31:  Scenario  1:  (5opt  and  ^opt  versus  steering  angle  for  Pa  and  P^.  True  power 
uses  the  traditional  definition  as  given  in  (3.27). 

Scenario  2 

In  this  scenario,  2  signals  are  arriving  at  elevation  angles  of  90"  and  94"  with  respect 
to  the  broadside  of  the  array.  Their  SNR’s  are  respectively  1  dB  and  5  dB.  The  array 
has  40  sensors  and  the  number  of  snapshots  available  to  estimate  the  SCM  is  25. 

The  plots  of  the  simulated  optimal  diagonal  loading  versus  steering  angle  for  true 
powers  evaluated  using  both  the  standard  and  alternative  approaches  are  respectively 
shown  in  Figures  3-33  and  3-34.  As  expected,  the  optimal  diagonal  loading  is  always 
positive  because  in  this  scenario,  c  >  1  (he.,  the  SCM  is  rank  dehcient).  However, 
even  though  the  SCM  is  rank  dehcient  in  this  case,  the  optimal  diagonal  loading 
when  steering  close  to  source  directions  is  small.  This  is  in  accordance  with  our 
earlier  observation  that  when  steering  close  to  main  beams,  the  optimal  estimator 
tends  to  perform  adaptation  as  much  as  possible,  fully  relying  on  the  available  data. 

As  a  rule  of  thumb,  it  has  been  observed  that  the  sensitivities  of  the  MSE  and 
squared  bias  around  their  minimizers  decrease  as  5opt  and  (5opt  get  large.  This  effect. 
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estimator  (a) 


estimator  (b) 


Figure  3-32:  Scenario  1:  (5opt  and  ^opt  versus  steering  angle  for  Pa  and  P^.  True  power 
uses  the  alternative  definition  as  given  in  (3.28). 


together  with  the  variability  in  the  simulated  curves,  causes  a  disagreement  in  the 
peaks  of  the  simulated  and  5^“^.  In  addition,  this  likely  contributes  to  the  per¬ 
formance  loss  observed  in  the  simulations  (such  as  in  Figures  3-16  and  3-14)  when 
steering  away  from  the  sources.  As  an  illustration,  the  dependence  of  the  squared 
bias,  variance  and  MSE  corresponding  to  power  estimator  Pa  on  diagonal  loading 
when  steering  in  the  direction  of  76°  is  shown  in  Fig.  3-35. 


3.9  Comparison  between  Power  Estimators 

The  optimized  mean  square  errors  and  sensitivities  to  optimal  diagonal  loading  of 
the  power  estimators  Pa  and  Pb  are  compared  in  this  section.  The  theoretical  result 
concerning  the  estimation  performance  is  stated  and  proved  in  the  first  part.  The 
sensitivities  of  the  power  estimators  on  optimal  diagonal  loading  are  compared  in  the 
second  part.  The  simulation  results  presented  in  the  last  part  validate  the  theoretical 
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estimator  (a) 


Figure  3-33:  Scenario  2:  (5opt  and  ^opt  versus  steering  angle  for  Pa  and  P^.  True  power 
uses  the  traditional  definition  as  given  in  (3.27). 


finding. 


3.9.1  Comparison  based  on  Estimation  Performance 

In  comparing  the  MSE  performance  of  power  estimators,  we  rely  on  the  conjecture 
that  the  difference  between  the  MSB’s  evaluated  at  optimal  loading  (5opt  and  the  load¬ 
ing  (5opt  which  minimizes  the  squared  bias  is  negligible  for  both  estimators.  Therefore, 
we  hrst  compare  the  MSB’s  of  the  two  estimators  evaluated  at  respectively  and 

^opf 

The  following  result  establishes  that  estimator  Pa  has  larger  variance  than  the 
estimator  Pf,  when  both  variances  are  measured  at  loadings  which  minimize  the  cor¬ 
responding  squared  biases. 

Lemma  3.7.  Denoting  by  and  S^apt  loadings  which  minimize  the  sguared  biases 
of  respectively  Pa  and  Pb  for  some  steering  direction  described  by  and  true  power 


132 


estimator  (a) 


Figure  3-34:  Scenario  2:  (5opt  and  5opt  versus  steering  angle  for  Pa  and  Pb.  True  power 
uses  the  alternative  definition  as  given  in  (3.28). 


Scenario  2:  Steering  direction=76° 


Figure  3-35:  Scenario  2:  Squared  bias,  variance  and  MSE  for  Pa  versus  diagonal 
loading  for  steering  angle  of  76°. 
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P  <  lim 


im^^oo  E  Pa{S)  ,  the  corresponding  variances  are  related  as 


var  ( A(Ct)  I  >  var  ( A(5ott) 


(3.122) 


Proof.  In  a  trivial  case  when  P  <  E  P(0)  ,  the  two  biases  are  minimized  for  5  =  0 
so  that  (3.122)  follows  directly  with  the  equality  sign. 


A  slightly  more  involved  case  arises  when  P  >  E  P(0)  .  To  simplify  the  exposi¬ 
tion,  the  notation  Q^  a  =  Qk  Qk,b  =  Qk  is  introduced. 


Due  to  assumption  P  <  lim^^oo  E  Pa(5)  ,  it  follows  that  5op|.  and  5®^  null  out 
the  corresponding  biases  so  that 


p,  1  r(a)  Q‘i,c 


(3.123) 


In  addition,  from  (3.112),  5op|.  >  5®^.  Therefore,  since  A(5)  =  is  a  monotonically 
increasing  function,  it  follows  that  almost  surely 


1  1 
- > - 

Ql,a  Qii 


(3.124) 


Thus,  for  some  positive  constant  K,  it  holds 


Ql,a 


-  Qi,b  (qp^  ^ 


(3.125) 


Taking  the  expectation  of  both  sides  of  (3.125)  and  rearranging  the  terms  yields 


E  ^  -PE  - ^  ^ 

.Ql,a\  [Ql,a  Ql,b\ 


(3.126) 


The  second  term  on  the  left  hand  side  of  (3.126)  is  evaluated  using  (3.123).  In 
addition,  since  >  0,  adding  its  expectation  to  the  left  hand  side  of  (3.126) 
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does  not  change  the  inequality.  Hence, 


n2 


Cp{Q2,a 

n2 


(3.127) 


Now  we  set  K  = 


2  (  Dm-\-^r. 


>  0  and  upper  bound  the  left  hand  side  of  (3.127)  by 


using  (A.l)  (from  Appendix  A)  which  implies  that  >  y.  Namely, 


p  [  1 

[el.  Q?,a  (  Qla  j 

“  i  QL 


(3.128) 


Recognizing  that  the  left  most  hand  side  of  (3.128)  is  the  quadratic  of  Pa  ( (5opt  j  yields 


E  [Pf  (^S)]  >  E  [f»  (i™, 


(3.129) 


Finally,  from  (3.123),  which  implies  that  A  ^^opt  j  =  A  j  ,  and  (3.129), 
we  obtain  (3.122).  □ 

If  the  true  power  F  <  lim^^oo  E  Pa(S)  ,  the  squared  biases  corresponding  to  Pa 
and  A  at  respectively  (5op|-  and  h^pt  are  equal.  Given  the  inequality  between  their 
variances  (3.122),  we  conclude  that  their  MSB’s  are  related  as 


MSB  (5^pl )  >  MSB  m 


(3.130) 


We  now  invoke  the  conjecture  that  the  MSB  loss  made  by  using  (5opt  instead  of 
(5opt  is  negligible  for  both  estimators.  Therefore,  the  optimal  MSB’s  are  approximately 


related  as 


MSB  >  MSB  (5(5,))  . 


(3.131) 
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3.9.2  Comparison  based  on  Sensitivity  to  Optimal  Diagonal 
Loading 

It  has  been  observed  in  the  simulation  study  that  the  power  estimator  Pf,  is  more 
sensitive  to  optimal  diagonal  loading  than  the  power  estimator  P^.  Relying  on  the 
conjecture  that  the  variance  has  negligible  impact  to  the  value  of  optimal  diagonal 
loading,  we  dehne  the  sensitivity  to  optimal  diagonal  loading  as  the  curvature  of  the 
squared  bias  around  the  diagonal  loading  which  nulls  out  the  squared  bias. 
Combining  the  results  proved  in  Lemma  3.1  that 

-  the  power  estimator  P^  is  greater  than  the  power  estimator  P^  for  diagonal 
loading  <5  >  0; 

-  the  power  estimator  Pa  is  monotonically  non-decreasing  and  has  zero  slope  as 
5  — )■  O’*"  and  5  — )■  cx)  and 

-  the  power  estimator  P^  is  strictly  monotonically  increasing  for  all  5  >  0, 

and  that  the  squared  bias  corresponding  to  is  minimized  at  smaller  diagonal  load¬ 
ing  than  that  corresponding  to  Pa,  given  by  (3.112),  we  conjecture  that  the  power 
estimator  Pf,  is  more  sensitive  to  optimal  diagonal  loading  than  the  power  estimator 
Pa. 

Proving  this  conjecture  or  hnding  the  specihc  conditions  under  which  it  holds  is 
a  possible  direction  for  future  study. 

3.9.3  Numerical  Validation 

This  part  verihes  the  theoretical  result  from  previous  part  via  simulations.  The 
two  scenarios  are  considered  and  the  true  power  is  evaluated  in  a  standard  way  us¬ 
ing  (3.27). 

The  comparison  between  the  MSB’s  of  the  two  power  estimators  when  the  arrival 
process  and  number  of  sensors  and  snapshots  is  as  specihed  by  Scenario  1,  is  shown 
in  Fig.  3-36.  The  MSB’s  in  the  top  hgure  are  evaluated  at  diagonal  loadings  which 
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minimize  the  bias.  The  plots  in  the  bottom  hgure  compare  the  optimized  MSB’s.  The 
simulation  results  in  the  top  hgure  validate  that  the  MSB  corresponding  to  estimator 
Pa  is  lower  bounded  by  the  MSB  corresponding  to  the  estimator  both  evaluated 
at  the  loadings  which  optimize  corresponding  squares  biases.  The  bottom  hgure 
conhrms  the  same  conclusion  for  the  optimized  MSB’s.  Note  that  the  MSB’s  of  Pa 
and  Phi  compared  in  Big.  3-36,  are  equal  when  steering  close  to  the  source  directions. 
These  are  the  ranges  within  which  the  optimal  loading  is  close  to  zero  and  the  two 
estimators  are  nearly  identical. 

The  comparisons  between  the  optimized  MSB’s  and  the  MSB’s  evaluated  at  the 
diagonal  loadings  which  optimize  the  corresponding  squared  biases  of  the  two  power 
estimators  in  Scenario  2,  are  shown  in  Big.  3-37.  In  comparison  to  Scenario  1,  a 
larger  diherence  between  the  MSB’s  of  the  estimators  Pa  and  A  is  observed  when  the 
number  of  snapshots  is  smaller  than  the  number  of  sensors. 


3.10  Conclusions 

This  chapter  presents  how  two  diagonally  loaded,  MPDR  beamformer  based  spatial 
power  spectrum  estimators  behave  in  the  snapshot  dehcient  regime. 

The  almost  sure  convergence  of  the  considered  power  spectrum  estimators  to  non- 
random  limits  when  the  number  of  snapshots  and  number  of  sensors  grow  large  at 
the  same  rate  is  proved  using  the  random  matrix  theory  methods.  The  variance  and 
estimation  MSB  of  one  of  the  power  estimators  are  characterized  under  the  assumption 
that  the  input  process  is  Gaussian  distributed. 

Burther,  the  dependences  of  the  spatial  power  estimators,  their  expectations  and 
variances  on  diagonal  loading  are  studied.  Unlike  in  standard  theory,  it  is  shown  that 
due  to  specihcs  of  the  dehcient  sample  support  regime,  the  biases  of  both  estimators 
are  in  general  minimized  for  non-zero  loadings.  In  addition,  the  MSB  loss  caused  by 
using  the  minimizer  of  the  squared  bias  as  the  optimal  diagonal  loading  is  negligible. 

In  addition,  the  dependence  of  optimal  loading  on  steering  direction  is  also  an¬ 
alyzed.  It  is  shown  that  when  steering  in  the  direction  of  a  source  which  is  well 
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Figure  3-36:  Scenario  1:  Comparison  between  MSE  performances  of  two  power  esti¬ 
mators.  The  MSE  is  evaluated  at  the  loading  which  minimizes  bias  (top)  and  at  the 
optimal  diagonal  loading  (bottom). 
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Figure  3-37:  Scenario  2:  Comparison  between  MSE  performances  of  two  power  esti¬ 
mators.  The  MSE  is  evaluated  at  the  diagonal  loading  which  minimizes  bias  (top) 
and  at  the  optimal  diagonal  loading  (bottom). 
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separated  from  the  interferers,  the  optimal  processor  is  a  matched  hlter  so  5opt  oo. 
If  the  sources  are  closely  spaced,  the  optimal  loading  is  hnite.  Furthermore,  we  show 
that  when  steering  close  to  the  source  direction,  the  optimal  loading  is  small  and 
can  even  be  zero,  meaning  that  the  optimal  estimator  tends  to  perform  adaptation 
as  much  as  possible,  fully  relying  on  the  available  data.  On  the  other  hand,  when 
steering  direction  moves  away  from  the  source  directions,  the  optimal  loading  tends 
to  increase. 

Finally,  the  performances  of  two  power  estimators  are  compared  and  it  is  proved 
that  the  optimized  Pb  outperforms  the  optimized  Pa;  and  the  optimal  S  is  lower  for 
estimator  than  it  is  for  estimator  i^.  However,  the  performance  of  estimator  i\,  is 
more  sensitive  to  the  determination  of  the  correct  diagonal  loading  than  is  estimator 
Pa.  All  the  presented  results  are  validated  via  Monte-Carlo  simulations. 

As  a  possible  future  work,  the  stated  conjecture  about  the  negligible  impact  of 
variance  on  the  value  of  optimal  diagonal  loading  needs  to  be  rigorously  proved. 
Also,  a  rigorous  sensitivity  analysis  of  power  estimators  on  optimal  diagonal  loading 
is  needed  to  complete  the  comparison  of  power  estimators.  In  addition,  the  results 
developed  for  Gaussian  distributed  snapshots  could  possibly  be  extended  to  more 
general  snapshot  statistics.  Furthermore,  we  point  out  that  the  ultimate  goal  con¬ 
cerning  the  problem  of  diagonal  loading  for  adaptive  beamforming  is  to  develop  a 
scheme  which  determines  the  optimal  diagonal  loading  based  on  the  received  data 
and  steering  direction.  Also,  the  presented  study  on  how  spatial  power  spectrum  es¬ 
timation  depends  on  diagonal  loading  could  be  applied  to  other  estimation  methods 
which  rely  on  diagonal  loading.  Finally,  the  presented  approach  could  possibly  be 
applied  for  studying  other  regularization  methods. 
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Chapter  4 


Time- Varying  Channel  Tracking 

4.1  Introduction 

The  identification  of  an  unknown  channel  from  the  input  signal  and  output  noisy  ob¬ 
servations  using  the  Least  Squares  (LS)  solution  is  a  long-standing  problem  studied 
for  different  applications  in  the  areas  such  as  adaptive  signal  processing  [27],  estima¬ 
tion  theory  [29],  machine  learning  [8]  and  communications  [57].  The  Recursive  Least 
Squares  (RLS)  algorithm  is  an  efficient  implementation  of  the  LS  algorithm  which 
has  the  ability  to  recursively  update  the  estimate  of  the  channel  based  on  the  most 
recent  input-output  sample  pair.  This  leads  to  an  application  of  the  RLS  algorithm 
for  tracking  slowly  varying  channels  [27].  One  of  the  areas  where  this  application 
is  particularly  important  is  wireless  communications  where  the  receiver  tracks  the 
channel  and  detects  the  transmitted  symbol  based  on  the  estimated  channel  impulse 
response  [48],  [39]. 

The  tracking  performance  of  the  RLS  algorithm  has  been  extensively  studied  in 
the  literature.  As  such,  the  steady-state  mean-square  estimation  error  is  analyzed 
in  [39],  [19]  and  [32],  where  the  impulse  response  of  a  channel  is  modeled  as  an  au¬ 
toregressive  process.  The  tracking  performance  of  the  RLS  algorithm  in  both  transient 
and  steady-state  regimes  is  studied  in  [17]. 

The  literature  also  reports  performance  studies  of  the  RLS  algorithm  in  some  spe- 
cihc  scenarios.  As  such,  the  steady  state  performance  of  the  RLS  based  identihcation 
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of  an  unknown  LTI  channel,  modeled  as  a  finite  impulse  response  (FIR)  filter  whose 
length  is  shorter  than  the  unknown  channel  length,  is  reported  in  [63].  The  impact 
of  round-off  errors  on  tracking  performance  is  studied  in  [1],  The  performance  of  the 
RLS  based  tracking  of  a  time-varying  channel  whose  coefficients  are  modeled  using 
Jakes’  model  is  studied  in  [47].  The  convergence  properties  of  the  RLS  algorithm 
and  their  dependence  on  the  initialization  is  analyzed  in  [36].  The  performance  of 
the  RLS  algorithm  when  used  to  detect  symbols  received  from  a  frequency  flat  fading 
channel  on  an  array  is  studied  in  [5]. 

The  main  challenge  in  the  analysis  of  the  performance  of  the  RLS  algorithm  is 
to  characterize  the  sample  correlation  matrix  (SCM)  of  the  input  process  and  the 
inverse  of  this  matrix  when  the  number  of  observations  is  comparable  to  the  dimen¬ 
sion  of  the  matrix  (e.g.,  the  size  of  the  Liter).  The  common  approach  in  addressing 
this  issue  essentially  includes  approximating  the  SCM  with  the  ensemble  correlation 
matrix.  While  these  matrices  are  approximately  equal  when  the  number  of  station¬ 
ary  observations  is  large,  they  might  differ  significantly  in  the  observation  dehcient 
regime.  The  performance  study  presented  in  this  chapter  addresses  this  problem  by 
exploiting  the  tools  and  results  from  random  matrix  theory  (RMT). 

A  performance  study  of  the  RLS  algorithm  when  it  is  used  to  track  a  channel  which 
varies  according  to  a  first  order  Markov  process  in  presented  in  this  chapter.  The  ex¬ 
pressions  for  signal  prediction  and  channel  estimation  mean  square  errors  (MSE)  are 
derived  and  validated  via  simulations.  The  general  results  are  applied  for  specihc 
scenarios  and  as  special  cases  we  consider  the  behavior  in  the  steady-state,  perfor¬ 
mance  of  LS-based  identification  of  linear  time-invariant  channel  and  performance 
of  the  sliding  window  RLS  algorithm.  Finally,  several  practical  results  such  as  those 
characterizing  the  optimal  exponential  forgetting  factor  in  the  exponentially  weighted 
RLS  or  optimal  averaging  window  length  in  the  sliding  window  RLS  algorithm  are 
obtained. 

The  characterization  of  the  Stieltjes  transform  of  the  channel  estimation  correla¬ 
tion  matrix  when  the  RLS  and  extended  RLS  algorithms  are  used  to  track  a  channel 
modeled  as  a  random  walk  is  reported  in  [56]  and  [55].  The  transient  behavior  of 
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the  signal  to  interference  plus  noise  ratio  (SINK)  at  the  output  of  the  RLS  estimator 
in  a  time-invariant  channel  is  studied  in  [38].  The  analysis  therein  is  fairly  general 
and  the  performance  metric  is  given  as  a  solution  to  a  set  of  non-linear  equations.  In 
comparison  to  those  results,  we  consider  the  signal  prediction  and  channel  estimation 
MSB’s  as  performance  metrics,  study  the  transient  behavior  of  the  RLS  algorithm 
directly  and  obtain  the  steady  state  results  as  a  special  case.  In  addition,  we  model 
the  channel  variations  as  a  first  order  Markov  process.  Finally,  by  employing  assump¬ 
tions  justihable  in  the  scenarios  of  practical  interest,  we  obtain  cleaner  and,  where 
possible,  closed  form  characterizations. 

This  chapter  is  organized  as  follows.  The  background  on  channel  tracking  is  sum¬ 
marized  in  Section  4.2.  A  general  theory  for  the  performance  characterization  of  the 
LS  based  tracking  of  the  first  order  Markov  channel  is  developed  in  Section  4.3.  The 
subsequent  sections  use  the  general  results  and  specialize  them  for  specific  problems. 
As  such.  Section  4.4  studies  the  algorithm’s  performance  in  the  steady  state.  An  LTI 
system  identihcation  is  treated  in  Section  4.5.  The  sliding  window  LS  algorithm  is 
studied  in  Section  4.6.  The  comparison  between  the  exponentially  weighted  LS  and 
the  sliding  window  LS  algorithm  when  used  to  track  a  hrst  order  Markov  channel  is 
discussed  in  Section  4.7.  Section  4.8  concludes  this  chapter. 

4.2  Background 

The  problem  of  tracking  a  time-varying  channel  using  the  Least  Squares  algorithm 
and  the  challenges  associated  with  analyzing  it  are  introduced  in  this  section.  In 
addition,  relevant  performance  analysis  results  from  the  literature  are  given.  Finally, 
the  assumptions  used  in  the  theoretical  analyzes  and  their  justihcations  are  presented. 

4.2.1  Problem  Formulation 

Given  the  inputs  to  a  hnite  impulse  response  (FIR)  channel  and  the  channel  outputs 
corrupted  by  noise,  the  channel  impulse  response  is  estimated  using  the  Least  Squares 
(LS)  algorithm.  A  channel  of  length  m  has  impulse  response  at  time  n  expressed  as 
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a  vector  w(n).  The  channel  variation  is  modeled  as  a  hrst  order  Markov  process 


w(n)  =  aw(n  —  1)  +  a;(n),  (4.1) 

where  (jj{n)  is  a  zero  mean  i.i.d.  process  noise  with  correlation  matrix  The 

channel  impulse  response  at  initial  time  w(0)  is  random  and  has  identity  covariance. 
A  parameter  a  <  1  is  a  state  transition  constant.  For  a  =  1,  (4.1)  becomes  a  random 
walk  model. 

The  channel  input  at  time  n  is  the  vector  u(n)  whose  entries  are  input  samples 
u{k),  /c  =  n,  n  —  1, . . . ,  n  —  m  +  1.  The  channel  output  is  corrupted  with  an  additive 
noise  process  v{n)  such  that  the  channel  output  at  time  n  is 

d{n)  =  w^(n)u(n)  +  v{n).  (4.2) 

We  assume  the  input  and  additive  noise  are  independent.  The  noise  process,  n(n),  is 
i.i.d.  with  zero  mean  and  variance  In  addition,  the  input  signal  vectors  u(n)  are 
assumed  to  be  zero-mean  with  correlation  matrix  E  [u  (n)u'^(n)]  =  R. 

Given  the  input  signal  vectors  u(i)  and  the  corresponding  desired  outputs  d{i)  up 
to  time  n,  the  estimated  channel  impulse  response  w(n)  is  [27] 

w(n)  =  R^^(n)r(n),  (4.3) 

where  R(n)  is  a  sample  correlation  matrix  (SCM),  computed  as 

n 

R(n)  =  A^-*u(i)u^(i)  +  (4.4) 

i=l 

and  r(n)  is  an  input-output  cross-correlation  vector,  given  by 

n 

r(n)  =  ^  A”“*u(i)(i*(i).  (4.5) 

i=\ 

A  forgetting  factor  A  G  (0,1),  usually  very  close  to  1,  accommodates  the  time- 
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variability  of  the  channel  by  suppressing  the  past  observations  not  relevant  for  current 
estimation.  The  real-valued,  non-negative  quantity  6  in  (4.4)  is  a  diagonal  loading 
parameter  which  handles  the  start-up  transient  of  the  algorithm  [27]. 

The  performance  of  the  tracking  algorithm  is  measured  via  channel  estimation 
error  e{n), 

£(n)  =  w(n)  —  w(n),  (4.6) 

and  the  signal  prediction  error  ^{n), 

^{n)  =  d{n)  —  {n  —  l)u{n) .  (4.7) 


4.2.2  Relevant  Results 


The  relevant  results  of  the  studies  reported  in  the  literature  are  derived  and  outlined 
in  [27].  The  performance  analysis  in  [27]  relies  on  the  direct  averaging  assumption 
and  the  approximation  of  the  sample  correlation  matrix  R(n)  of  the  stationary  input 
process  (such  that  the  unity  forgetting  factor  is  used)  with  ^R(ri)  ~  R,  which  holds 
true  when  the  number  of  observations  n  is  much  larger  than  the  channel  length  m. 

When  a  channel  vector  varies  according  to  an  ’’almost”  random  walk  model  (a 
model  from  (4.1)  with  a  — )■  1)  and  the  forgetting  factor  A  is  very  close  to  1,  the 
channel  estimation  error  in  a  steady  state  is  given  by  [27] 


E  [IkWII^]  ^  {R  +  2(1  -  A) 


where  ||||2  is  the  L2  norm  of  a  vector,  tr{}  denotes  the  trace  of  a  matrix  and  Ro  is  a 
covariance  matrix  of  the  process  noise  uj{n).  By  equating  the  error  terms  correspond¬ 
ing  to  the  process  and  observation  noise,  the  optimal  value  of  the  forgetting  factor 
Aopt  is  evaluated  as  [27] 


A 


opt 


1 


J_  /  tr{Ro}  A  2 
cTy  Vfr  / 


(4.9) 


For  the  case  when  the  system  is  LTI,  i.e.,  a  =  1  and  Ro  =  0,  and  A  =  1,  the 
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expressions  for  the  mean-square  values  of  the  channel  estimation  and  signal  prediction 
errors  are  for  m  given  by  [27] 

E||k(n)llJ  «  (4.10) 

and 

E[||^(n)||^]  ^  (4.11) 

The  performance  analysis  given  in  [27]  assumes  the  number  of  observations  n  is 
much  larger  than  the  channel  length  m.  However,  when  n  and  m  are  of  the  same  order, 
the  estimated  and  true  correlation  matrices  may  differ  signihcantly.  The  theoretical 
characterization  of  the  LS  algorithm,  developed  in  this  chapter  with  the  use  of  random 
matrix  theory  methods,  does  not  require  n  be  large.  Furthermore,  compared  to  [27], 
the  performance  of  the  LS-based  tracking  of  a  broader  class  of  channel  variations  is 
characterized. 

4.2.3  Assumptions  and  Remarks 

The  main  assumptions  used  in  the  performance  characterization  presented  in  this 
chapter  are  as  follows. 

(1)  The  forgetting  factor.  A,  is  assumed  to  be  close  to  1  in  the  derivation  of  both 
the  channel  estimation  and  signal  prediction  MSE. 

(2)  The  state  transition  parameter,  a,  is  assumed  to  be  close  to  1  in  the  derivation 
of  signal  prediction  MSE. 

(3)  The  input  observation  vectors  u(n)  are  assumed  to  be  independent  and  identi¬ 
cally  distributed. 

Note  that  the  analysis  in  [27]  uses  the  same  assumptions. 

Although  these  assumptions  might  seem  somewhat  too  restrictive,  the  simulations 
show  that  the  derived  expressions  are  valid  for  the  ranges  of  A  and  a  that  are  in  fact 
of  practical  importance.  This  is  shown  in  Figures  4-4  and  4-17  and  emphasized  in  the 
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corresponding  discussions.  In  short,  the  RLS  algorithm  fails  to  accurately  track  a  hrst 
order  Markov  varying  channel  whose  state  transition  parameter  a  is  not  sufficiently 
close  to  1.  Also,  the  effective  averaging  window  size  corresponding  to  a  value  of 
forgetting  factor  A  not  sufficiently  close  to  1  might  be  too  short  with  respect  to  the 
number  of  unknown  coefficients,  which  in  turn  does  not  lead  to  acceptable  tracking 
performance. 

The  third  assumption  is  used  in  the  evaluation  of  the  moments  of  the  SCM  using 
random  matrix  methods,  in  particular  Theorem  2.1,  as  elaborated  in  Section  2.4.  In 
reality,  the  observation  vectors  at  the  input  of  an  FIR  hlter  are  not  independent. 
However,  the  results  obtained  via  simulations  conducted  such  that  the  consecutive 
observation  vectors  are  shifted  with  respect  to  each  other,  conhrm  the  validity  of  the 
derived  expressions.  In  addition,  this  assumption  is  fairly  common  in  adaptive  hlter 
theory. 

As  a  hnal  remark,  random  matrix  theory  provides  tools  for  evaluating  the  per¬ 
formance  metrics  of  our  interest  without  restricting  A  and  a  be  sufficiently  closed  to 
1.  However,  our  intention  is  to  characterize  the  performance  using  a  relatively  clean 
and,  where  possible,  closed  form  expressions.  Such  expressions  are  derived  under  the 
stated  assumptions  which  are  justihed  in  the  regime  of  practical  importance  where 
the  RLS  algorithm  is  able  to  track  the  channel  reasonably  well. 


4.3  Performance  Analysis 


This  channel  estimation  and  signal  prediction  MSB’s  associated  with  the  LS-based 
tracking  of  hrst  order  Markov  channel  with  dehcient  sample  support  are  characterized 
in  this  section. 
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4.3.1  Channel  Estimation  Error 


A  channel  impulse  response,  modeled  as  a  first  order  Markov  process  (4.1),  is  at  time 
n  expressed  in  terms  of  the  initial  impulse  response  w(0)  as 

n 

w(n)  =  a”w(0)  +  ^  (4.12) 

2=1 

Substituting  (4.2)  into  (4.5),  the  cross-correlation  vector  is  expressed  as 

n 

r(n)  =  ^  A'^“*u(i)(u'^(i)w(i)  -|-n*(f)).  (4-13) 

2=1 

After  substituting  (4.13)  into  (4.3)  and  using  (4.12),  the  estimated  channel  vector 
becomes 


w  n  = 


R  ^(n)^a*A’^  *u(i)u'^(i)w(0) 

i=l 

n  i 

+  R-i(n)  A"-*u(^)u^(^)  5^a*-^u.(j) 

i=l 
n 

+  R-i(n)^A’^-yi) 


i=i 


V  z  . 


(4.14) 


2=1 


The  channel  estimation  error,  after  substitution  of  (4.12)  and  (4.14)  into  (4.6),  can 
be  decomposed  into  three  terms 


£{n)  =  ei{n)  +  e2{n)  +  Ssfn), 


where  £i(n)  is  the  error  induced  by  the  initial  channel  response 


(4.15) 


Siln  = 


I  -  R-\n)  a'A"-'u(i)u^( 


2=1 


w(0). 


(4.16) 
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e2{n)  is  the  error  induced  by  the  random  portion  of  the  channel  dynamics  (process 
noise), 

n  n  i 

£2{n)  =  —  R“^(n)  a'^~^u{j),  (4-17) 

i=l  i=l  j=l 

and  S3(n)  is  the  observation  noise  induced  error 

n 

=  — R~^(n)  ^  (4-18) 

i=\ 

Due  to  assumed  independence  of  the  initial  channel  response,  input  process  and  obser¬ 
vation  and  process  noises,  the  three  error  terms  are  uncorrelated.  In  the  following,  the 
expected  norms  of  these  error  terms  are  evaluated.  The  identities  tr  {AB}  =  tr  {BA} 
and  E  [tr  {A}]  =  tr  (E  [A]}  are  used. 

The  mean  square  value  of  the  initial  channel  response  induced  error  £i(n)  is,  after 
some  algebraic  manipulation,  given  by 

n 

E  [||£i(n)||2]  =  ^  A^-VE  [u^(i)R~i(n)u(i) 

i=l 

n  n 

[tr  |u(*)u^(*)R-2(n)u(j)u^(j)}] 

i=l  j=l 

(4.19) 

Recall  that  the  initial  channel  response  vector  w(0)  is  assumed  to  be  random  and  of 
identity  correlation.  Therefore,  its  correlation  matrix  gives  rise  to  multiplication  with 
m  in  the  hrst  term  and  is  absorbed  in  the  trace  operator  in  the  last  term  of  (4.19). 

The  channel  dynamics  induced  error  e2{n)  can  be  expressed  as  a  sum  of  the 
uncorrelated  error  terms  (which  are  denoted  efc(n)) 

n 

S2(n)  =  ^efc(n),  (4.20) 

k=l 
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where 


efc(n) 


a”-^I 


R 


[n 


i=k 


(4.21) 


The  mean  square  value  of  €2(71)  is  thus 


E||E2(n)||l]  =^E[||e*(n)||l],  (4.22) 

k=l 


where  the  expression  for  E  [||efc(n)||2],  obtained  after  some  algebraic  manipulation,  is 
given  by 


i=k 


i=k  j=k 


tr<!u(i)u^(i)R  ^(n)u(j)u"(j) 


Hi 


(4.23) 


Finally,  the  expected  norm  of  the  observation  noise  induced  error  ez{n)  is  given  by 


E[||es(n)lll]  =<t2^A«"-<)e 


2=1 


u^(i)R 


n  uU 


Overall,  the  channel  estimation  MSE  is 


(4.24) 


E||£(n)||^]  =  ^E||£i(n)||^] 


2=1 


(4.25) 


4.3.2  Signal  Prediction  Error 


Substituting  (4.2)  into  (4.7)  and  using  model  (4.1),  the  signal  prediction  error  is 
expressed  as 


^{n)  =  [aw^{n  —  1)  —  w^{n  —  1))  u(n)  +  u^{n)u{n)  +  v{n).  (4.26) 
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The  simulation  results,  presented  in  Section  4.6.2,  imply  that  a  channel  varying 
according  to  a  hrst  order  Markov  model  is  not  tractable  unless  a  is  close  to  1.  Con¬ 
sequently,  to  obtain  a  compact  representation  of  ^(n),  it  is  assumed  that  a  ~  1  so 
that 

^{n)  ^  e^{n  —  l)u(n)  -|-  {n)vL{n)  +  v{n).  (4.27) 

While  ^{n)  does  not  explicitly  depend  on  a,  this  dependence  is  implicitly  accounted 
for  via  the  dependence  of  e{n)  on  the  state  transition  parameter,  a. 

If  the  input  process  has  identity  correlation,  the  signal  prediction  MSE  is,  us¬ 
ing  (4.27),  given  by 


E  E  [\\e{n  -  1)11^]  +  mal  +  (4.28) 

Thus,  to  evaluate  the  signal  prediction  MSE,  we  need  to  characterize  the  hrst  term 
in  (4.28). 


4.3.3  Theoretical  Prediction  of  Unknown  Quantities 


The  MSE  values  of  the  error  terms  expressed  with  (4.19),  (4.23)  and  (4.24)  depend 
on  two  quantities  which  we  dehne  as 


Vk{n,m,i)  =  E 
W{n,m,i,j)  =  E 


u'^fdR  ^(n)u(i) 


tr  <!u(i)u^(i)R  ^(n)u(j)u"(j) 


Hi 


(4.29) 

(4.30) 


The  index  k  in  14  can  be  either  1  or  2.  These  quantities  are  studied  in  the  following 
parts  and  approximately  expressed  in  terms  of  the  expected  moments  of  the  SCM 
Mfc(m,  n)  dehned  as  the  expected  normalized  trace  of  the  powers  of  the  SCM  inverse, 
that  is 


Mk{m,n) 


m 


tr|R-^(n)} 


(4.31) 
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We  are  interested  in  the  expected  moments  whose  corresponding  index  /c  is  1  or  2. 
Note  that  Mo{m,n)  =  1. 


Two  cases  are  considered.  In  the  first  case,  the  ensemble  correlation  matrix  of  the 
input  process  R  is  unconstrained  and  the  expected  moment  Mi  (m,  n)  is  approximated 
for  n  >  m  and  5  =  0  with  the  limiting  moment,  as  is  elaborated  in  Section  2.7. 
The  limiting  moment  Mi  corresponding  to  the  exponentially  weighted  SCM  with 
forgetting  factor  A  and  5  =  0  is  characterized  in  Section  2.5  and  given  by  (2.37). 
Hence,  the  expected  hrst  moment  is  approximately  given  by  the  solution  to  the  hxed 
point  equation 


1 

Mi(m,  n) 


in—k 


y - 

^tr{R“^}  +  n) 


(4.32) 


Note  that  in  a  special  case  when  A  =  1,  Mi{m,n)  is  given  in  a  closed  form 

Mi(m,n)  =  — 7 - -tr|R~^}.  (4.33) 

m{n  —  m) 


In  the  second  case,  R  =  I  and  there  is  no  restriction  on  the  number  of  observation 
vectors,  n.  This  means  that  n  can  be  smaller  than  m  and  a  positive  diagonal  loading 
factor  5,  is  needed.  The  expected  moments  with  the  corresponding  index  k  =  1,2  are 
approximated  with  the  limiting  moments,  evaluated  in  closed  form  in  Section  2.5  and 
given  by 


Mi(m,  n) 


max  0, 1 - 

c 


1  +  {m  —  nY  +  25{m  +  n) 

5  25m 


n  —  m\  —  5 


(4.34) 


and 


M2(m,  n)  =  max  0, 1 - 7:7  — 

c  J  5^ 


\n  —  m\ 
26‘^m 


+ 


(m  —  n)^  +  5{m  +  n) 


25^771^/52  +  (m  —  n)2  +  2d{m  +  n) 

(4.35) 
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Evaluation  of  14 


We  approximate  in  this  part  the  quadratic  form  14  for  k  =  1,2.  As  a  remark,  an 
exact  asymptotic  characterization  can  be  obtained  by  using  the  tools  from  random 
matrix  theory,  such  as  the  integration  by  parts  formula  and  Poincare-Nash  inequality. 
However,  our  goal  is  to  obtain  a  simple,  yet  accurate  characterizations  by  employing 
the  assumption  justihed  in  practical  scenarios. 

A  rectangular  window  (A  =  1)  is  considered  hrst.  In  that  case,  each  input  snapshot 
u(i),i  =  1,2, ...  ,n  contributes  to  the  sample  correlation  matrix  R(n)  equally,  with 
unit  weight.  Due  to  symmetry,  Vk{n,m,i)  does  not  depend  on  the  index  i,  i.e., 
Vk{n,m,i)  =  Vk{n,m).  Therefore, 

Vk{n,m)  = 


When  exponential  weighting  is  employed,  different  observation  vectors  are  weighted 
differently,  so  that  Vk{n,m,i)  depends  on  the  observation  index  i.  In  the  absence 
of  a  better  approach,  we  assume  forgetting  factor  A  is  very  close  to  1  such  that 
Vk{n,m,i)  ~  Vk{n,m).  This  approximation  is  further  justihed  because  k  can  only 
take  on  values  1  or  2  (larger  values  of  k  make  the  approximation  less  accurate). 
Therefore, 

^  1  _  \n 

^  A'^"*14(n,m,i)  ^  y— yl4(n,m).  (4.37) 

i=l 


n 


m] 


i=l 

n 


-Ee 

n  ^ 


i=l 


-E 

n 

-E 

n 

m 


tr  <  R  ^(n)  u{i)u^{ 


i=l 


tr  R-^(n)  (  R(n)  -  51 


—  [Mk-i{m,n)  -  6Mk{m,n) )  . 


(4.36) 
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Without  the  approximation,  the  expression  becomes 


n 

y^^X^~^Vk{n,m,i) 

=  E 

tr  1  R-^(n)  V  A’^-*u^(i)u(i)  1 

i=l 

.  1  i=l  J  . 

=  m  (m,n)  —  6X^Mk{m,n)'^  .  (4.38) 

Therefore,  equating  the  right-hand  sides  of  (4.37)  and  (4.38), 

-| 

Vk{n,m,i)  ^  (^Mk-i{m,n)  -  5X^Mk{m,n)^  .  (4.39) 

Letting  A  1  and  noting  that  (1  — A)/(l  — A*^)  ^  1/n,  (4.36)  is  recovered  from  (4.39). 
In  addition,  note  that  the  expected  moments  Mk  from  (4.39)  correspond  to  the  ex¬ 
ponentially  weighted  SCM. 

To  conhrm  the  reasoning  for  A  =  1  and  approximation  for  A  close  to  1,  14  (n,  m,  i) 
for  i  =  l,2,...,n  and  k  =  1,2  are  evaluated  using  Monte-Carlo  simulations.  The 
corresponding  plots  are  shown  in  Figures  4-1  and  4-2  for  respectively  A  =  1  and 
A  =  0.995.  As  can  be  observed,  the  conclusion  that  Vk{n,m,i)  does  not  depend  on  i 
when  A  =  1  is  validated  in  Fig.  4-1. 

On  the  other  hand,  the  approximation  V2{n,m,i)  ^  14(77-,  m)  is  less  accurate 
when  A  <  1,  as  shown  in  Fig.  4-2.  In  the  absence  of  more  accurate  and  algebraically 
simple  characterization  for  Vk{n,m,i)  when  A  <  1,  we  use  (4.39)  in  the  analysis.  The 
validation  of  the  results  obtained  in  the  later  parts  conhrm  that  approximation  (4.39) 
is  acceptable.  As  a  hnal  note,  the  approximation  is  getting  more  accurate  as  A  is 
approaching  one. 


Evaluation  of  W 

The  approximation  of  quantity  W  is  derived  in  this  part.  As  for  14,  our  goal  is  to 
characterize  this  quantity  with  relatively  simple  and  accurate  expressions  by  utilizing 
the  assumption  justihed  in  realistic  scenarios. 

Based  on  whether  the  indices  i  and  j  in  Wk{n,m,i,  j)  are  equal  or  not,  two  new 
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Figure  4-1:  Vi  (top  figure)  and  V2  (bottom  figure)  versus  observation  index  i  for 
n  =  500,  m  =  30  and  A  =  1. 


quantities  are  defined 


X(n,  m,  i) 


E 


tr  u(i)u'^(i)R  {n)u{i)u^  {i 


H 


(4.40) 


and 


Y{n,m,i,  j)  =  El  tr  u(i)u^(i)R  ^(n)u(j)u 


(4.41) 


When  using  the  rectangular  window,  X{n,m,i)  =  X{n,m)  and  Y{n,m,i,  j)  = 
Y{n,m,io,jo)  for  the  same  reason  as  was  stated  for  the  case  of  Vk{n,m,i).  Here,  zq 
and  jo  ai'e  hxed  and  not  equal. 

The  quantity  X  is  expressed  as 


X{n,m)  =  E  ||u(z)||2U^(z)R  ^(n)u(z) 


(4.42) 


and  we  assume  that  ||u(z)||2  and  u'^(z)R  ^(n)u(z)  are  independent  random  variables 


155 


0.18 
0.175 
_  0.17 

'e 

3  0.165 
0.16 
0.155 
0.15 


Ji 

■  ■  ■ 


T%  . 


X 


'*'Vi,.| 


100  200  300  400 

observation  index  i 


500 


2.6 


2.4 


X  10 


E 

^  2.2 


1.8 


'Vw„ 


X. 


X. 


\ 


V, 


100  200  300  400 

observation  index  i 


tS 

500 


Figure  4-2:  Vi  (top  figure)  and  V2  (bottom  figure)  versus  observation  index  i  for 
n  =  500,  m  =  30  and  A  =  0.995. 

when  n  and  m  are  of  the  same  order.  Namely,  u^(7)R~^(?7,)u(7)  is  a  norm  of  a 
vector  obtained  by  rotating  u(z)  with  the  eigenvectors  of  R(n)  and  scaling  the  entries 
of  the  resulting  vector  with  the  squared  inverse  of  the  eigenvalues  of  R(n).  While 
rotating  a  vector  with  an  orthogonal  matrix  has  no  impact  on  its  norm,  scaling  its 
entries  by  randomly  chosen  eigenvalues  from  broad  support  (because  n  and  m  are 
comparable)  makes  u'^(7)R“^(n)u(7)  and  ||u(7)||2  approximately  independent.  Hence, 
X  is  approximated  as 

X(n,  m)  tr{R}V'2(n,  m).  (4.43) 

The  theoretical  approximation  (4.43)  is  compared  with  the  simulation  results  for 
different  n  and  m  =  30  and  the  plots  are  shown  in  Fig.  4-3.  Excellent  agreement 
is  obtained  even  for  large  values  of  n.  The  input  process  in  this  test  is  Gaussian 
distributed.  A  similar  agreement  is  also  obtained  for  a  uniformly  distributed  binary 
input  sequence. 

When  an  exponential  weighting  is  employed,  A  is  assumed  to  be  very  close  to 
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Figure  4-3:  Comparison  between  the  Simulated  and  Approximated  X(n,  30) 


1.  Thus,  X{n,m,i)  ^  X{n,m),  Y{n,m,i,  j)  ~  Y{n,m,io,jo),  where  io  7^  jo,  and 
X{n,m)  is  evaluated  using  (4.43). 

To  evaluate  Y,  two  possible  decompositions  of  the  weighed  sums  involving  W (n,  m,  i,  j) 
are  given  by 


i=l  j=l 


i=l 


X  — 

y— ^X(n,m)  +2 


i=l i^j=l 

(1-A")(A-A^) 
(1-A)2(1  +  A) 


Y{n,m,io,jo) 


(4.44) 


and 


i=l  j=l 


E  tr{  J]  A“-u(i)u"(,)R,-7n)  Y  V-'u(j)u"(j)} 

j=i  i=i 

m  (MQ{m,  n)  —  2(5A"'Mi(u,  m)  +  5‘^XM2{n,  m 


(4.45) 
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By  equating  the  right  hand  sides  of  (4.44)  and  (4.45),  we  obtain  the  relation 
between  X  and  Y 

T^A^^  +  2  (i_y)2\i  +  A)  ^  =  m  (Mo  -  26X-M,  +  ,  (4.46) 

and  evaluate  Y  for  X  approximated  using  (4.43). 

Substituting  A  =  1  (rectangular  window)  into  (4.46),  the  exact  relation  between 
X  and  Y  reads  as 


nX  +  n{n  -  l)Y  =  m  {Mo  -  25Mi  +  •  (4.47) 

4.4  Channel  Tracking  in  Steady  State 

The  theoretical  framework  developed  in  the  previous  section  is  used  in  this  section 
to  analyze  the  performance  of  the  exponentially  weighted  LS  algorithm  in  the  steady 
state,  he.,  when  the  number  of  input  observation  vectors  n  oo.  In  the  hrst  part 
we  evaluate  the  channel  estimation  MSE.  The  numerical  validation  of  the  derived 
expression  is  given  in  the  second  part.  In  addition,  the  assumption  that  the  forgetting 
factor  A  is  close  to  1,  used  in  the  derivations,  is  justihed. 

4.4.1  Performance  Analysis 

The  steady  state  channel  estimation  MSE  of  the  exponentially  weighted  LS  algorithm 
is  evaluated  in  this  part.  Note  that  the  channel  varies  according  to  the  first  order 
Markov  model  and  the  non-unitary  forgetting  factor  A  limits  the  estimator  memory 
and  enables  the  algorithm  to  accommodate  time-variability. 

Taking  the  limit  n  — )■  cxd  of  (4.19),  we  conclude  that  the  contribution  of  the 
error  term  due  to  the  initial  channel  vector  to  the  overall  channel  estimation  MSE 
disappears  in  the  steady  state.  This  result  is  intuitively  appealing. 

Substituting  the  approximate  expressions  for  14  and  W  into  (4.23),  the  obtained 
expression  into  (4.22)  and  taking  the  limit  n  — )■  cxd  of  the  result,  the  power  of  the 
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error  induced  by  the  random  portion  of  the  channel  dynamics  is  in  the  steady  state 
given  by 


poo  _ 

(l  +  a)(l-aA) 


2A  -  1  + 

1  + A  ^ 


(4.48) 


Similarly,  substituting  the  approximations  for  14  and  W  into  (4.24)  and  taking 
the  limit  n  — )■  cxd,  the  power  of  the  error  induced  by  the  observation  noise  is  given  by 


771  fT  ~ 

TDOO  _  i\/TOO 

-'noise  “  1  ’ 


(4.49) 


The  powers  (4.48)  and  (4.49)  are  given  in  terms  of  the  expected  hrst  moment  if 
the  SCM  in  the  steady  state,  M“,  dehned  as 


lim  E 

n^oo 


Mi{m,  n) 


(4.50) 


Note  that  here  only  the  number  of  observation  vectors  n  becomes  large,  while  m  is 
kept  constant.  However,  since  A  <  1,  the  effective  number  of  observation  vectors  is 
hnite  so  that  the  SCM  might  not  approach  ensemble  correlation  matrix  R. 

Taking  the  limit  n  ^  cxd  of  (4.32),  the  moment  is  characterized  with  the  hxed 
point  equation 

=  ^ (4,51) 

4.4.2  Numerical  Validation  and  Discussion 

The  derived  expression  for  the  channel  estimation  MSE  in  the  steady  state  is  tested 
by  comparing  its  agreement  with  the  estimation  error  obtained  via  Monte-Carlo  sim¬ 
ulations  and  characterization  (4.8)  from  [27].  A  simulated  zero- mean  Gaussian  input 
process  with  non-identity  correlation  is  processed  through  a  hrst  order  Markov  vary¬ 
ing  channel  with  the  state  transition  parameter  a  =  0.99  and  process  noise  variance 
0.01.  The  output  signal  is  corrupted  with  white  noise  such  that  the  signal-to-noise 
ratio  (SNR)  of  the  output  signal  is  10  dB.  The  channel  is  estimated  using  the  LS 
algorithm  with  different  values  for  the  forgetting  factor  A. 

The  top  hgure  in  Fig.  4-4  shows  the  comparison  between  the  theory,  simulations 
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and  characterization  (4.8)  derived  in  [27]  on  a  relatively  wide  range  of  forgetting 
factors  A.  The  bottom  hgure  in  Fig.  4-4  shows  the  comparison  between  the  theory 
and  simulations  on  a  narrower  range  of  forgetting  factors  A. 


Recall  that  the  theoretical  expressions  (4.49)  and  (4.48)  are  derived  under  the 
assumption  that  A  is  close  to  1.  Also,  characterization  (4.8)  is  derived  under  the 
same  assumption,  with  the  additional  assumption  that  the  state  transition  parameter 
a  — )■  1.  As  can  be  observed  from  the  hgures,  the  derived  characterization  accurately 
predicts  the  performance  when  A  is  close  to  1.  Next  we  argue  that  the  range  of 
forgetting  factors  A  not  sufficiently  close  to  1  has  no  practical  importance. 


Namely,  given  that  in  this  example  the  algorithm  is  estimating  m  =  30  unknown 
channel  coefficients,  a  value  of  forgetting  factor  A  whose  corresponding  effective  av¬ 
eraging  window  size  is  not  long  enough  to  accommodate  the  adaptation  of  m  =  30 
coefficients,  is  not  to  be  used  in  a  practical  scenario.  Assuming  that  in  the  worst 
case  scenario  we  need  at  least  one  observation  per  unknown  coefficient,  the  forget¬ 
ting  factor  A  should  be  such  that  the  corresponding  effective  averaging  window  size 
Ueff  is  30.  A  widely  used  rule  of  thumb  relates  the  value  of  forgetting  factor  A  and 
effective  averaging  window  size  Uefr  as  Uefr  =  1/(1  —  A).  This  implies  that  the  effective 
averaging  window  length  of  30  corresponds  to  A  =  0.9667.  Therefore,  if  A  <  0.9667, 
the  algorithm  fails  to  reasonably  well  track  the  channel  variations  and  these  forget¬ 
ting  factors  are  not  to  be  used  in  a  practical  application.  Finally,  the  comparison  in 
Fig.  4-4  conhrms  that  the  theoretical  characterization  derived  under  the  assumption 
that  A  is  close  to  1,  is  valid  for  the  range  of  forgetting  factors  that  are  of  practical 
importance. 


As  a  hnal  remark,  note  that  the  value  of  the  smallest  A  to  be  used  in  a  practical 
scenario  is  a  conservative  estimate  because  more  than  one  observation  per  dimension 
is  needed  for  achieving  a  satisfactory  performance. 
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forgetting  factor  (X) 


Figure  4-4:  Steady  state  channel  estimation  MSE  for  first-order  Markov  channel.  The 
input  process  has  unconstrained  covariance  and  the  channel  length  is  30.  Top  figure 
compares  our  theoretical  prediction,  result  from  Haykin’s  text  (4.8)  and  simulations 
on  a  wide  range  of  forgetting  factors.  The  bottom  figure  compares  our  theoretical 
prediction  and  simulations  on  a  narrower  range  of  forgetting  factors.  The  values  on 
the  vertical  axis  in  both  figures  are  relative  to  the  L2  norm  of  the  channel  vector  in 
steady  state,  which  is  15  in  this  case. 
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4.5  Linear-Time  Invariant  Channel  Identification 


This  section  considers  a  linear  time-invariant  (LTI)  channel  identification  problem 
with  the  LS  algorithm.  The  channel  estimation  and  signal  prediction  mean  square 
errors  are  evaluated  and  the  analytical  expressions  are  validated  via  Monte-Carlo  sim¬ 
ulations.  We  show  that  at  low  SNR,  a  deterioration  in  the  performance  appears  when 
the  number  of  observations  is  close  to  the  channel  length.  This  effect  is  characterized 
and  explained. 

4.5.1  Performance  Analysis 

Channel  Estimation  Error 

A  performance  characterization  of  the  LS-based  identification  of  a  linear,  time  in¬ 
variant  channel  is  obtained  from  the  theory  developed  in  Section  4.3.  An  LTI  chan¬ 
nel  is  described  with  a  =  1  and  =  0  in  model  (4.1),  i.e.,  the  channel  vector 
w(n)  =  w(0)  =  wq.  Due  to  channel  invariability  in  time,  the  input  observations 
are  rectangularly  windowed,  i.e.,  A  =  1.  Substituting  a  =  1,  =  0  and  A  =  1 

into  (4.19),  (4.22)  and  (4.24),  the  power  of  the  error  induced  by  the  initial  channel 
vector  is  given  by 

E  [\\ei{n)\\l]  =  mS‘^M2{m,n),  (4.52) 

where  m  is  the  channel  length,  6  is  the  diagonal  loading  and  M2(m,  n)  is  the  expected 
first  moment  corresponding  to  the  SCM.  The  power  of  the  error  induced  by  the 
observation  noise  is  given  by 

E  0l^3(»^)||2]  =  -  6M2im,n)),  (4.53) 

where  is  the  variance  of  the  observation  noise  and  M2{m,  n)  is  the  expected  second 
moment  corresponding  to  the  SCM.  Note  that  the  error  due  to  channel  dynamics  S2(n) 
is  0.  Overall,  the  channel  estimation  MSE  is 

E  [||£(n)||2]  =  malMi(n,m)  —  m5{al  —  5)M2{n,m).  (4.54) 
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When  the  input  process  has  identity  covariance  I,  the  functional  dependence  of 
the  channel  estimation  MSE  on  the  number  of  observations  n,  channel  length  m 
and  diagonal  loading  parameter  6  is  obtained  by  substituting  the  closed  form  expres¬ 
sions  (4.34)  and  (4.35)  for  Mi{m,n)  and  M2{'n^-,n)  into  (4.54). 

The  expectation  of  the  second  moment  M2  (m,n)  appears  in  the  expression  for 
the  channel  estimation  MSE  in  a  product  with  the  diagonal  loading  5,  which  has 
negligible  impact  on  the  performance  when  n  >  m.  Consequently,  for  n  >  m,  the 
impact  of  the  terms  depending  on  S  is  neglected.  Substituting  (4.33)  into  (4.54)  yields 

E[lk(n)||g  (4.55) 

k=i  ^ 

where  A^’s  are  the  eigenvalues  of  the  ensemble  correlation  matrix. 

Signal  Prediction  Error 

When  the  input  process  has  identity  correlation  I,  (4.28)  gives  the  signal  prediction 
MSE  in  terms  of  the  channel  estimation  MSE. 

Eor  an  input  process  of  unconstrained  correlation  R,  the  signal  prediction  MSE 
is  derived  by  assuming  the  diagonal  loading  5  is  zero  (meaning  that  the  number  of 
observations  is  greater  than  the  channel  length).  Using  (4.27)  we  get 

E[e(n)r(n)]  =  tr  {E  [e{n  -  l)e^ {n  -  1)R]  }  +  al  (4.56) 

The  correlation  matrix  of  the  channel  estimation  error  could,  for  the  LTI  channel 
identification  problem  and  5  =  0,  be  shown  to  be 

n 

E  [£(n)£'^(n)]  =  a^^^E  R“^(n)u(f)u'^(f)R“^(n) 

i=l 

=  a^E  [R”^(n)]  .  (4.57) 

If  the  input  process  is  Gaussian,  the  SCM  is  Wishart  distributed.  Using  the 
expression  for  the  expectation  of  the  inverse  of  the  Wishart  matrix  [27],  the  covariance 
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matrix  of  the  channel  estimation  error  becomes 


E[£(n)£'^(n) 


n  —  m 


(4.58) 


For  a  general  non-Gaussian  case,  the  expectation  of  the  SCM  inverse  is  evaluated 
using  the  RMT  results.  Namely,  it  is  shown  in  Section  2.5.2  that  if  the  order  of 
the  SCM  m  and  the  number  of  observations  n  grow  large  at  the  same  rate  such 
that  ^  — )■  c  G  (0, 1),  the  inverse  of  the  rectangularly  windowed  SCM  almost  surely 
converges  to  the  scaled  inverse  of  the  ensemble  correlation  matrix  (2.54).  Noting 
that  the  SCM  R  defined  in  (4.4)  with  A  =  1  and  <5  =  0  is  a  scaled  version  of  the 
model  for  rectangularly  windowed  SCM  considered  in  Section  2.5.2,  the  convergence 
result  (2.54)  is  rewritten  as 


1  - 

— R(n) 
n 


-1 


-)■ 


1  -c 


R 


-1 


a.s. 


(4.59) 


As  elaborated  in  Section  2.7,  the  expected  inverse  of  the  SCM  is  approximated 
with  the  limiting  quantity  in  (4.59)  such  that 


E 


R“^i 


n] 


R-i 


n  —  m 


(4.60) 


Finally,  from  (4.58)  and  (4.56)  the  signal  prediction  MSE  for  Gaussian  input 
process  is 

E  [ian)P]  =  "  ~  ^  (4.61) 

which  is  valid  for  n  >  m  +  2.  Note  that  this  constraint  carries  no  practical  insight. 

Similarly,  substituting  (5.26)  into  (4.57)  and  the  obtained  result  into  (4.56)  yields 
an  approximation  for  the  signal  prediction  MSE  when  the  received  signal  has  general 
statistics, 

E  [l«n)t]  «  (4.62) 

n  —  m  —  1 

which  is  valid  for  n  >  m  +  1.  Note  that  (5.28)  also  approximates  the  Gaussian  case. 
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A-Posteriori  Signal  Prediction  MSE 

Although  not  treated  in  the  development  of  the  general  theory,  the  channel  time 
invariability  renders  relatively  simple  analysis  of  the  a-posteriori  signal  prediction 
error.  This  error,  denoted  ^a{n),  is  defined  as 

^a{n)  =  d{n)  -  w'^(n)u(n) 

=  (w(0)  —  w(n))^u(n)  +  n(n).  (4.63) 

Substituting  a  =  1  and  A  =  1  into  (4.14)  and  plugging  the  obtained  result 
into  (4.63)  yields 

n 

iain)  =  (5w^R“^(n)u(n)  +  v{n)  —  ^  u'^(i)R~^(n)u(n)u(i).  (4.64) 

i=\ 

Denoting  the  first  and  last  terms  in  (4.64)  with  ei(n)  and  62  (n)  and  noting  that  ei(n) 
and  62 (n)  as  well  as  ei(n)  and  v{n)  are  independent,  the  mean-square  value  of  the 
a-posteriori  prediction  error  becomes 

E  [|l?.(n)in  =  E  [||e.(n)f  ]  +  E  [||e2(n)f  ]  +  -  2Eh(n><,-(n)|.  (4.65) 

The  power  of  ei(n)  is  evaluated  as 

E  [||ei(n)|p]  =  E[(5^wo^R~^(n)u(n)u'^(n)R“^(n)wo] 

=  m5'^M2{m,n).  (4.66) 

Similarly,  the  expression  for  the  power  of  62  (n)  is  obtained  via 

n 

E  [||62(?^)|P]  =  y^E[u^(n)R~^(n)u(i)u'^(i)R~^(n)u(n)] 

i=l 

=  mal^Mi{m,n)  —  6M2{m,n)  .  (4.67) 
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The  cross-power  between  62  (n)  and  noise  sample  v{n)  is 


E  [e2{n)v*{n) 


E 


^ {n)u{n)v* {n) 


n). 


(4.68) 


Finally,  substituting  (4.66),  (4.67)  and  (4.68)  into  (4.65)  yields 

E[||^a(n)|p]  =  -  malMi{m,n)  +  m{S  -  al)SM2{m,n).  (4.69) 

If  the  input  process  is  of  identity  correlation,  the  moments  Mi  and  M2  are  calculated 
through  (4.34)  and  (4.35).  If,  on  the  other  hand,  the  input  observations  are  of  arbi¬ 
trary  correlation  R,  the  a-posteriori  MSE  is  computed  for  n  >  mhj  neglecting  S  and 
evaluating  moment  Mi  using  (4.33). 


4.5.2  Theoretical  versus  Numerical  Results 

To  test  the  accuracy  of  the  derived  expressions,  the  mean-square  values  of  the  channel 
estimation  and  signal  prediction  errors  are  computed  via  Monte-Carlo  simulations  and 
used  as  the  ground  truth.  A  channel  with  impulse  response  wq  processes  the  input 
signal  and  the  output  is  corrupted  with  the  observation  noise  whose  variance  is 

The  plots  of  the  mean-square  values  of  the  channel  estimation  and  signal  prediction 
errors  versus  number  of  observations  when  the  signal-to-noise  ratio  is  40  dB  are  shown 
respectively  in  Fig.  4-5  and  Fig.  4-6.  The  input  data  stream  is  an  i.i.d.  standard 
Gaussian  random  process,  a  diagonal  loading  parameter  used  in  the  RLS  algorithm 
is  5  =  0.01  and  the  channel  has  length  30. 

The  corresponding  plots  for  SNR=5  dB,  <5  =  0.1  and  i.i.d.  standard  Gaussian 
input  are  given  in  Fig.  4-7  and  Fig.  4-8.  When  the  input  process  is  a  uniform  bipolar 
input  sequence  with  SNR=5  dB  and  5  =  0.1,  the  comparison  between  the  simulations 
and  theoretical  predictions  of  the  channel  estimation  and  signal  prediction  MSB’s  are 
shown  in  Fig.  4-9  and  Fig.  4-10. 

When  the  input  is  a  correlated  multivariate  Gaussian  process,  the  corresponding 
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Figure  4-5:  Channel  estimation  MSE  vs  number  of  observations.  The  input  process  is 
uncorrelated  Gaussian,  the  channel  length  is  30  and  SNR=40  dB.  The  corresponding 
result  from  Haykin’s  text  is  (4.10). 


Figure  4-6:  Signal  prediction  MSE  vs  number  of  observations  for  independent  Gaus¬ 
sian  input,  channel  of  30  taps  and  40  dB  SNR.  The  result  from  Haykin’s  text  is  (4.11). 
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Figure  4-7:  Channel  estimation  MSE  vs  number  of  observations  for  independent 
Gaussian  input,  channel  of  length  30  and  SNR  of  5  dB. 


Figure  4-8:  Signal  prediction  MSE  vs  number  of  observations  for  independent  Gaus¬ 
sian  input,  channel  of  length  30  and  SNR  of  5  dB. 
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Figure  4-9:  Channel  estimation  MSE  vs  number  of  observations  for  bipolar  uniform 
input,  channel  of  length  30  and  SNR  of  5  dB. 


Figure  4-10:  Signal  prediction  MSE  vs  number  of  observations  for  bipolar  uniform 
input,  channel  of  length  30  and  SNR  of  5  dB. 
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Figure  4-11:  Channel  estimation  MSE  versus  number  of  observation  when  the  input 
process  has  unconstrained  correlation  matrix,  channel  is  of  length  30  and  SNR  is  10 
dB. 

performance  curves  for  SNR=10  dB  and  channel  length  30  are  shown  in  Fig.  4-11. 

The  hgures  also  include  the  plots  of  the  corresponding  analytical  expressions  de¬ 
rived  in  Haykin’s  text  [27]  and  given  in  (4.10)  and  (4.11).  The  presented  plots  show 
that  the  derived  expressions  closely  match  the  experimental  curves.  In  addition,  Fig¬ 
ures  4-7,  4-8,  4-9  and  4-10  reveal  deterioration  in  the  algorithm’s  performance  when 
the  SNR  is  relatively  small  and  the  number  of  observations  becomes  close  to  the 
channel  length.  This  happens  irrespective  of  the  statistics  of  the  input  signal.  We 
explain  this  effect  in  the  following  part. 

4.5.3  Performance  Deterioration 

In  the  following  analysis  the  input  process  is  assumed  i.i.d.,  he.,  R  =  I.  To  explain 
why  the  channel  estimation  and  signal  prediction  errors  exhibit  a  bump  when  the 
number  of  observations  n  is  around  the  channel  length  m,  the  algorithm’s  update 
equation  is  exploited  [27] 

w(n)  =  w(n  —  1)  +  Il~^{n)u{n)^a{n).  (4.70) 
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Figure  4-12:  Expected  squared  norm  of  the  gain  vector  vs  number  of  observations  n. 
The  channel  length  is  30. 

A  gain  vector  k(n)  =  R'^(n)u(n)  gives  the  direction  of  the  update  of  the  esti¬ 
mated  channel  impulse  response.  Its  expected  squared  norm  is 

E[k^(u)k(n)]  =  E[u^(?7,)R“^(?7,)u(?7,)]  =  ¥2(71, m).  (4-71) 


The  comparison  between  the  Monte-Carlo  simulated  E  [||k(n)||2]  and  its  analytical 
characterization  given  in  (4.71)  and  (4.36)  is  shown  in  Fig.  4-12.  Aside  from  getting 
very  close  match  between  the  simulations  and  theory,  it  is  observed  that  the  squared 
norm  of  the  gain  vector  exhibits  a  bump  when  the  number  of  observations  n  is  close 
to  the  channel  length  m. 

To  grasp  the  intuition  behind  such  a  behavior,  the  norm  of  the  gain  vector  is 
expressed  in  terms  of  the  eigenvalues  (arranged  in  decreasing  order)  and  the  cor¬ 
responding  eigenvectors  qu  of  R(n), 

min(m,n) 

l|kMIII=  E 

fc=l 

Note  that  for  n  <  m,  the  m  —  n  smallest  eigenvalues  of  R(n)  are  equal  to  5.  The 


q"u(u) 


Aa 


(4.72) 
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eigenvectors  corresponding  to  those  ’’trivial”  eigenvalues  are  orthogonal  to  the  sub¬ 
space  spanned  by  the  received  snapshots  u(l), . . .  ,u(n).  Therefore,  the  summation 
in  (4.72)  goes  up  to  index  min(m,n). 

As  can  be  noted  from  (4.72),  the  norm  of  the  gain  vector  is  dominated  by  the 
smallest  ”  non-trivial”  eigenvalue  Amin(m,n)-  This  eigenvalue  becomes  extremely  small 
for  n  ~  m,  which  triggers  the  peak.  To  understand  why,  it  is  assumed  that  the  obser¬ 
vation  vectors  u(n)  are  drawn  independently  and  uniformly  from  an  m-sphere.  Thus, 
each  one  contributes  statistically  equally  to  the  energy  along  each  ”  non-trivial”  direc¬ 
tion  in  the  eigenspace  of  R(n).  A  new  non-trivial  direction  is  acquired  with  each  u(n) 
when  n  <  m  and  the  energy  along  that  direction  is  approximately  1/n  of  the  incom¬ 
ing  observation  vector’s  energy.  Therefore,  as  n  approaches  m,  the  energy  along  the 
newly  acquired  direction  decreases.  Consequently,  the  minimal  non-trivial  eigenvalue 
decreases,  which  causes  increase  in  the  norm  of  the  gain  vector.  When  n  exceeds  m, 
R(n)  has  full  ”  non-trivial”  rank,  so  each  new  observation  vector  contributes  energy 
along  all  directions.  The  smallest  eigenvalue  increases,  so  the  norm  of  the  gain  vector 
decreases. 

To  visualize  how  the  minimal  ”  non-trivial”  eigenvalue  depends  on  observation  in¬ 
dex  n,  the  approach  from  [18]  is  adopted.  Namely,  the  limiting  eigenvalue  density 
is  viewed  as  a  probability  density  function  of  a  randomly  chosen  eigenvalue. 
This  is  justihed  by  the  fact  that  if  in  a  series  of  different  realizations  of  R(n),  an 
eigenvalue  is  chosen  uniformly  at  random  for  each  realization,  then  the  appropriately 
scaled  histogram  of  the  collection  of  those  eigenvalues  closely  matches  the  plot  of 
[15].  Therefore,  a  cumulative  distribution  function  of  the  /c-ith  largest  eigen¬ 
value  Fk{x)  =  Pr  [Afc  <  x]  is  expressed  as 

m  .  ^ 

Y.  (4,73) 

where  F{x)  is  a  cumulative  distribution  function  corresponding  to  probability  density 
The  expected  value  of  the  smallest  non-trivial  eigenvalue  is  computed  from 
its  cumulative  distribution  function  and  plotted  for  different  numbers  of  observations 
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Figure  4-13:  Expected  values  of  the  smallest  non-trivial  eigenvalue  vs  number  of 
observations.  The  number  of  dimensions  is  30. 

n  in  Fig.  4-13.  It  can  be  observed  that  behavior  of  the  smallest  non-trivial  eigenvalue 
is  on  a  par  with  the  intuitive  reasoning  and  the  plots  in  Fig.  4-12. 

Although  the  norm  of  the  gain  vector  k(?7,)  depends  only  on  the  input  process, 
the  performance  degradation  for  n  ~  m  tends  to  appear  at  lower  SNR’s.  The  reason 
is  revealed  from  (4.70).  Namely,  the  channel  impulse  response  is  updated  based  on 
the  gain  vector  k(?7,)  and  the  signal  prediction  error  i{n).  As  SNR  decreases,  the 
signal  prediction  error  increases.  In  that  case,  the  bump  in  the  gain  vector  appearing 
when  n  ~  m  is  further  amplihed,  causing  increase  in  the  norm  of  the  correction  term 
in  update  equation  (4.70).  Consequently,  the  channel  estimation  error  increases.  In 
turn,  the  signal  prediction  error  at  next  iteration  gets  larger.  Overall,  the  channel 
estimation  error  and  the  signal  prediction  error  are  causing  each  other’s  increase  from 
one  iteration  to  the  next.  This  effect  persists  until  the  bump  in  the  norm  of  the  gain 
vector  lasts. 

As  a  hnal  note,  the  SNR  at  which  the  performance  degradation  arises  is  determined 
by  other  system  parameters,  mainly  by  the  channel  length  m.  Following  the  presented 
intuitive  reasoning,  it  is  deduced  that  the  largest  SNR  for  which  the  performance 
deteriorates  when  n  is  close  to  m,  gets  larger  as  the  channel  length  increases. 
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4.6  Sliding  Window  RLS  Algorithm 


The  estimation  performance  of  the  sliding  window  LS  algorithm  is  studied  in  this 
section.  The  analytical  expression  for  the  channel  estimation  MSE  is  validated  via 
simulations.  Also,  the  sliding  window  length  which  minimizes  the  channel  estimation 
MSE  for  uncorrelated  input  is  approximated. 


4.6.1  Performance  Analysis 

The  channel  estimation  MSE  of  the  sliding  window  LS  algorithm  used  to  estimate 
a  hrst  order  Markov  channel  is  obtained  using  the  developed  theoretical  framework. 
Here,  we  set  the  forgetting  factor  A  =  1  and  the  number  of  observations  n  is  viewed 
as  the  sliding  window  length. 

Thus,  substituting  A  =  1  and  approximations  for  14  and  W  into  (4.19),  (4.22) 
and  (4.24),  the  error  powers  T’i(n),  due  to  the  initial  channel  vector,  P2{n),  due  to 
the  channel  dynamics,  and  Psin),  due  to  the  observation  noise,  are  evaluated  in  terms 
of  the  expectations  of  the  moment,  Mi{m,  n)  and  M2(m,  n).  Assuming  n  >  m,  which 
alleviates  the  need  for  diagonal  loading  6,  the  power  of  the  error  due  to  the  initial 
channel  vector  is 


Pi{n) 


1  —  an  1  —  a^n 
2a^(l  —  a”)(a  —  a'^)m  —  Mi(m, n)tr{R}j 

n(n  —  1)(1  —  a)2(l  +  a) 


the  power  of  the  error  due  to  the  channel  dynamics  is 


Poin)  =  ma‘ 


.1  —  a 


2n 


1  — 


n  —  a 


,  1  —  a 
1  -a2 


a' 


2n 


,2(l-a”)(l-a”+i)m 
(1  —  a)2(l  +  a)  n 

TTl  ~ 

— Mi(m,  n)tr{R} 


2  1  —  Mi(m, n)tr{R}  |  2na  2a(l  —  a”)  2a^(l  —  a^”)  | 

n(n-l)  l(l-a)2(l  +  a)  “  (1  -  a)3  +  (1  -  a)3(l  +  a)2  j 

(4.75) 
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and  the  power  of  the  error  due  to  the  observation  noise  is 


Pz{n)  =  mal{Mi{m,n)  —  SM2{m,n)).  (4.76) 

Overall,  the  channel  estimation  MSE  is  given  by 

E  0l^Wll2]  =  Piin)  +  P2(n)  +  P3(n).  (4.77) 

When  the  ensemble  correlation  matrix  of  the  input  process  is  identity  I,  the  ex¬ 
pressions  (4.34)  and  (4.35)  for  moments  Mi  and  M2  are  used  to  evaluate  Pi,  P2  and 
P3.  Also  in  that  case  a  signal  prediction  MSE  is  approximated  with  (4.28).  On 
the  other  hand,  when  the  input  observations  are  of  an  unconstrained  correlation  R, 
Ml  (n,m)  is  evaluated  for  n  >  m  using  (4.33). 

4.6.2  Theoretical  versus  Numerical  Results 

The  accuracy  of  the  derived  expressions  is  tested  using  Monte-Carlo  simulations. 
When  R  =  I,  the  corresponding  plots  for  a  slowly  varying  channel  (a  =  0.995)  are 
shown  in  Fig.  4-14  and  4-15.  When  the  channel  varies  more  rapidly,  he.,  for  a  =  0.95, 
the  corresponding  plots  for  the  channel  estimation  and  signal  prediction  errors  are 
shown  in  Fig.  4-16  and  Fig.  4-17. 

The  corresponding  comparisons  between  the  theory  and  simulations  when  an  in¬ 
put  has  an  unconstrained  correlation  R  are  shown  in  Fig.  4-18  and  Fig.  4-19  for 
respectively  slowly  (a  =  0.995)  and  more  rapidly  (a  =  0.95)  varying  channel. 

In  all  cases,  the  input  process  is  Gaussian  distributed,  channel  has  length  m  =  30, 
process  noise  is  of  variance  =  0.01  and  SNR  is  10  dB.  As  a  technical  detail,  the 
SNR  is  dehned  at  the  channel  output.  Since  the  input  signal’s  power  per  channel  tap 
is  tr{R}/m  and  the  expected  squared  L2  norm  of  the  channel  vector  in  steady  state 
is  —  a^),  the  SNR  is  given  by 


SNR  =  10  logio 


a^trlR} 


(4.78) 


The  vertical  axis  in  each  plot  is  normalized  with  the  L2  norm  of  the  channel  vector 
in  the  steady  state. 

Note  that  as  the  window  length  n  — )■  oo,  the  estimation  MSE  asymptotically 
approaches  0  dB  level.  This  is  because  all  the  channel  variations  are  averaged  out  with 
inhnitely  large  rectangular  window  so  that  the  estimated  channel  vector  approaches 
0.  Therefore,  the  estimation  error  converges  to  the  L2  norm  of  the  channel  vector 
(he.,  to  0  dB  in  hgures  due  to  normalization). 

For  a  hnite  averaging  window  length  n,  the  performance  curves  exhibit  different 
behaviors  depending  on  the  value  of  the  state  transition  parameter  a.  As  such,  when 
the  channel  varies  rapidly,  the  performance  curve  is  above  0  dB  level  as  can  be  ob¬ 
served  in  Figures  4-16,  4-17  and  4-19,  where  a  =  0.95.  Effectively,  the  RLS  algorithm 
is  not  able  to  track  such  a  channel  and  the  channel  estimation  error  is  minimized  for 
a  trivial  estimator  w  =  0.  On  the  other  hand,  when  the  channel  varies  slowly,  the 
optimal  channel  estimation  MSE  is  below  0  dB  level  and  is  achieved  for  a  finite  av¬ 
eraging  window  length.  This  could  be  observed  in  Figures  4-14,  4-15  and  4-18  where 
a  =  0.995. 

Furthermore,  it  could  be  noted  that  the  performance  curves  corresponding  to 
smaller  a  shown  in  Figures  4-16  and  4-17  show  the  counterintuitive  result  that  when 
the  number  of  observations  drops  below  the  dimensionality  of  the  system,  there  is  a 
drop  in  the  MSE  as  the  number  of  observations  is  decreased.  This  result  is  not  due 
to  the  tracking  of  channel  dynamics  as  is  the  case  in  the  downward  trend  in  MSE 
as  the  number  of  observations  is  decreased  and  has  been  studied  and  explained  in 
Section  4.5.3. 

Finally,  recall  that  the  characterization  of  signal  prediction  MSE  has  been  derived 
by  assuming  that  the  state  transition  parameter  a  is  close  to  1.  Besides  obtaining 
a  good  agreement  between  the  theoretical  predication  and  simulations  of  the  signal 
prediction  MSE  corresponding  to  a  slowly  varying  channel  when  a  =  0.995  (not 
shown  in  hgures),  a  good  agreement  is  obtained  even  for  a  rapidly  varying  channel 
when  a  =  0.95,  shown  in  Fig.  4-17.  Although  this  latter  case  does  not  have  practical 
signihcance  because  the  RLS  algorithm  is  not  able  to  track  the  channel  reasonably 
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Figure  4-14:  Theoretical  versus  simulated  channel  estimation  MSE  for  a=0.995,  cXo  = 
0.1  and  SNR=10  dB. 


well,  it  confirms  that  the  derived  characterization  is  valid  at  least  for  the  range  of  a’s 
corresponding  to  the  cases  of  practical  significance. 


4.6.3  Optimal  Window  Length 


An  approximate  expression  for  the  optimal  window  length  Uopt  is  derived  in  this  part 
using  the  observation  that  the  RLS  algorithm  reasonably  well  tracks  the  channel 
when  the  state  transition  parameter  a  is  close  to  1.  Therefore,  we  assume  that  the 
channel  vector  exhibits  a  random  walk  (he.,  a  — >■  1).  In  that  case  the  steady  state 
mean  square  channel  estimation  error  for  a  sliding  window  length  n  >  m  is  obtained 
from  (4.77)  by  letting  a  — )■  1  and  5  ~  0  as 


E  [ii£(")iig  = 


,m(2n^  —  nm  +  5m 


6(n 


4^)  ,  .,2 

- 1-  cr„  - 


m 


m] 


n 


m 


(4.79) 


Setting  the  first  derivative  of  (4.79)  to  zero  (or  equating  the  error  terms  induced  by 
the  channel  dynamics  and  observation  noise)  and  solving  for  n,  yields  the  expression 
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Figure  4-15:  Theoretical  versus  simulated  signal  prediction  MSE  for  a=0.995,  Uo  =  0.1 
and  SNR=10  dB. 


Figure  4-16:  Theoretical  versus  simulated  channel  estimation  MSE  for  a=0.95,  Uo  = 
0.1  and  SNR=10  dB. 
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Figure  4-17:  Theoretical  versus  simulated  signal  prediction  MSE  for  a=0.95,  Uo  =  0.1 
and  SNR=10  dB. 


Figure  4-18:  Theoretical  versus  simulated  channel  estimation  MSE  for  a=0.995,  Uo  = 
0.1,  SNR=10  dB,  and  input  process  of  unconstrained  correlation. 
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Figure  4-19:  Theoretical  versus  simulated  channel  estimation  MSE  for  a=0.95,  Uo  = 
0.1,  SNR=10  dB,  and  input  process  of  unconstrained  correlation. 


for  the  optimal  sliding  window  length.  When  R  =  I,  this  expression  simplihes  to 

/  +  m  +  6^ 

nopt  =m+]^ - - - (4.80) 

The  accuracy  of  (4.80)  is  tested  via  simulations  for  different  values  of  a,  SNR 
and  process  noise  variance  a^,  and  the  results  are  summarized  in  Table  4.1.  The 
entries  Uopt  and  pertain  to  respectively  (4.80)  and  the  simulated  optimal  window 
length.  The  quantity  A  is  the  difference  between  the  simulated  MSB’s  corresponding 
to  window  lengths  riopt  and  n™.  The  channel  length  in  all  considered  cases  is  30. 
As  can  be  observed  from  the  table,  the  error  due  to  using  riopt  evaluated  form  (4.80) 
does  not  exceed  0.05  dB. 

Finally,  note  that  the  outlined  method  for  finding  the  optimal  sliding  window 
length  is  derived  by  assuming  a  random  walk  model  for  the  channel.  The  simulation 
results  in  Table  4.1  imply  that  the  expression  is  fairly  accurate  when  channel  varies 
according  to  a  first  order  Markov  process  whose  state  transition  parameter  a  is  close  to 
1.  However,  this  is  in  fact  the  regime  of  practical  importance  because  the  estimation 
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Table  4.1;  Optimal  Sliding  Window  Length 


a 

SNR 

"^opt 

^sim 

'^ovt 

A 

1 

0.01 

10  dB 

52 

48 

=  0.06  dB 

0.99 

0.01 

10  dB 

60 

63 

<  0.05  dB 

0.98 

0.01 

10  dB 

56 

65 

<  0.05  dB 

0.99 

0.01 

20  dB 

50 

53 

=  0.006  dB 

0.98 

0.0001 

20  dB 

52 

58 

=  0.017  dB 

0.99 

0.01 

5  dB 

74 

77 

=  0.027  dB 

error  when  a  state  transition  parameter  a  is  not  sufficiently  close  to  1  is  relatively 
high  for  any  hnite  averaging  window  length,  implying  that  the  channel  is  effectively 
not  being  tracked. 


4.7  Insights  in  Exponentially  Weighted  and  Slid¬ 
ing  Window  LS  Algorithms 

This  section  develops  some  practical  results  about  the  LS  algorithm  when  the  input 
process  has  identity  correlation.  These  results  are  valid  even  if  the  input  process 
has  few  distinct  eigenvalues.  Namely,  it  has  been  observed  that  the  eigenvalues  corre¬ 
sponding  to  the  noise-only  subspace  of  the  input  process  which  has  few  distinct  eigen¬ 
values,  approximately  behave  as  if  the  input  process  has  identity  correlation.  This 
qualitative  observation  has  also  been  exploited  to  develop  algorithms  that  estimate 
the  number  of  signals  embedded  into  the  noise  with  small  number  of  samples  [30]. 

4.7.1  Effective  Number  of  Observations 

When  the  input  observations  are  exponentially  windowed  with  forgetting  factor  A, 
it  is  often  of  practical  importance  to  assess  the  equivalent  number  of  stationary, 
rectangulary  windowed  observation  vectors  that  give  rise  to  the  same  quality  of  the 
SCM.  A  commonly  used  rule  of  thumb  is  Ues  =  1/(1  —  A).  A  more  accurate  relation 
is  derived  in  this  part  using  random  matrix  theory  results. 
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Formally,  the  problem  can  be  formulated  as  finding  Ues  such  that 

n 

lim  V  (k)  ^  V]  un{k)u^{k),  (4.81) 

^=1  k=l 

where  the  vectors  u^lk)  and  Uj:j(n)  of  length  m  originate  from  the  identical  and 
stationary  processes  of  zero  mean  and  identity  correlation. 

The  SCM’s  on  the  left  and  right  hand  sides  of  (4.81)  are  denoted  respectively 
and  The  limiting  Eigenvalue  Density  Function  fiiix)  of  is  parameterized 
with  q  =  m{l  —  A)  and  is  given  by  (2.63)  and  (2.4).  The  limiting  Eigenvalue  Density 
Function  fiL^x)  has  a  compact  support  whose  endpoints  Xi  and  X2  are  given  by 

2^1,2  =  log  a;i,2  +  g  +  1.  (4.82) 

The  limiting  Eigenvalue  Density  Function  fiR  of  is  given  by  Marcenko-Pastur 
law  (2.56)  and  is  parameterized  with  c  =  The  endpoints  of  the  support  are  for 
c  <  1  given  by 

/min=  (1-  (4.83) 

/ma.=  +  (4.84) 

Since  the  plots  of  fiR  and  fiR  look  alike  when  c  <  1  (because  then  hr  has  no  mass 
at  0),  equation  (4.81)  is  approximately  solved  by  matching  the  endpoints  of  the  two 
density  functions.  Thus,  the  effective  number  of  observations  Ues  for  which  the  upper 
limits  Xi  and  /^ax  of  jiR  and  iir  coincide  is  from  (4.82)  and  (4.84)  given  by 

{l  +  yfcf  =  2\og{l  +  y/^)+q  +  l,  (4.85) 

where  c  =  A  requirement  c  <  1  is  equivalent  to  g  <  1.6.  A  similar  equation  can 
be  obtained  for  the  lower  limits  to  coincide. 

To  derive  a  more  handy  relation  between  A  and  Uefr,  the  first  two  terms  of  the 
Taylor  series  expansion  of  log  function  (the  series  exists  because  c  <  1)  are  taken  into 
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Figure  4-20:  The  limiting  Eigenvalue  Density  Functions  of  the  exponentially  weighted 
SCM  and  SCM’s  with  rectangular  windowing  with  the  effective  observation  window 
size  computed  from  the  derived  expression  (4.86)  and  conventional  rule  of  thumb 
1/(1  —  A).  The  dimension  of  the  observation  vector  is  m  =  30,  forgetting  factor  is 
A  =  0.99  and  q  =  0.3. 


account,  which  yields 

nes  =  (4-86) 

The  same  expression  is  obtained  if  the  lower  limits  X2  and  /min  are  matched  and  the 
log  function  is  approximated  with  Erst  two  terms  of  its  Taylor  series. 

The  relation  (4.86)  is  tested  by  comparing  the  plots  of  the  limiting  EDF’s  fii  and 
fiR  for  A  =  0.99  and  A  =  0.95  in  Fig.  4-20  and  Fig.  4-21,  respectively.  The  limiting 
EDF’s  corresponding  to  the  SCM  with  rectangular  windowing  whose  observation 
window  size  is  computed  using  the  conventional  rule  of  thumb  1/(1  —  A)  are  also 
shown.  The  problem  dimension  in  both  cases  is  m  =  30.  Note  that  parameter  q 
corresponding  to  Fig.  4-21  is  1.5,  which  is  close  to  the  value  (of  1.6)  for  which  a 
resemblance  between  the  two  plots  is  possible.  Also,  since  (4.86)  is  an  approximation 
of  (4.85),  the  endpoints  of  the  two  density  functions  do  not  coincide  exactly. 
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Figure  4-21:  The  limiting  Eigenvalue  Density  Functions  of  the  exponentially  weighted 
SCM  and  SCM’s  with  rectangular  windowing  with  the  effective  observation  window 
size  computed  from  the  derived  expression  (4.86)  and  conventional  rule  of  thumb 
1/(1  —  A).  The  dimension  of  the  observation  vector  is  m  =  30,  forgetting  factor  is 
A  =  0.95  and  q  =  1.5 
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4.7.2  Exponentially  Weighted  versus  Sliding  Window  RLS 


The  LS  algorithm  suffers  from  the  performance  deterioration  if  the  SCM  is  ill-conditioned. 
The  appearance  of  a  bump  in  the  channel  estimation  and  signal  prediction  MSB’s,  de¬ 
scribed  for  the  LTI  channel  identification  case  in  Section  4.5.3,  is  intrinsically  caused 
by  ill-conditioned  SCM.  The  exponentially  weighted  and  sliding  window  LS  algo¬ 
rithms  are  compared  based  on  how  well  the  corresponding  SCM’s  are  conditioned. 
Namely,  comparing  the  plots  of  the  limiting  eigenvalue  density  functions  from  Fig¬ 
ures  4-20  and  4-21,  one  may  observe  that  the  Eigenvalue  Density  Function  of  the 
rectangularly  windowed  SCM  tends  to  have  eigenvalues  closer  to  zero.  In  other  words, 
it  is  easier  to  bound  the  eigenvalues  away  from  zero  in  the  exponentially  windowed 
algorithm  than  in  its  rectangularly  windowed  counterpart.  Consequently,  it  can  be 
conjectured  that  the  exponential  weighting  with  forgetting  factor  A  could  be  better 
suited  than  the  rectangular  windowing  with  window  length  n. 

More  specifically,  given  the  sliding  window  length  n,  it  is  possible  to  determine  an 
equivalent  forgetting  factor  A  such  that  the  upper  end  points  of  the  limiting  eigenvalue 
densities  of  the  corresponding  SCM’s  coincide.  Consequently,  the  lower  end  point  of 
the  density  pertaining  to  the  exponentially  weighted  SCM  is  slightly  further  away 
from  the  origin  meaning  that  the  SCM  is  better  conditioned.  For  a  given  sliding 
window  length  n,  the  corresponding  A  which  ensures  this  is  using  (4.85),  given  by 

,  =  1  _  (i  +  yft-2i°g-(i  +  vf)-i 

m 

The  behaviors  of  the  sliding  window  LS  with  the  window  length  n  and  the  LS  with 
the  exponentially  weighted  window  with  forgetting  factor  A,  computed  using  (4.87), 
are  compared  via  simulations.  The  simulation  study  shows  that  the  exponentially 
weighted  LS  outperforms  the  sliding  window  LS  for  both  LTI  and  first  order  Markov 
channels.  Fig.  4-22  and  Fig.  4-23  show  the  performance  plots  of  the  considered 
algorithms.  It  can  be  conjectured  that  given  the  window  length  n  in  a  sliding  window 
RLS,  it  is  always  possible  to  compute  forgetting  factor  A  using  (4.87),  such  that  the 
exponentially  weighted  RLS  outperforms  a  sliding  window  RLS  in  a  steady  state. 
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— e—  Exponentially  weighted  RLS 


forgetting  tactor  (<->equivalent  length  ot  the  sliding  window) 

Figure  4-22:  Exponentially  weighted  versus  sliding  window  RLS  for  SNR=10  dB.  The 
integers  along  the  green  curve  represent  the  sliding  window  lengths  equivalent  to  the 
corresponding  values  of  forgetting  factor. 

Furthermore,  at  low  SNR’s,  the  optimal  window  length  of  the  sliding  window  RLS 
and  the  optimal  forgetting  factor  of  an  exponentially  weighted  RLS  are  related  as 
in  (4.87).  At  higher  SNR’s,  the  optimal  forgetting  factor  is  slightly  below  the  value 
computed  using  (4.87)  for  a  given  n  =  Uopt-  However,  the  difference  between  the 
MSB’s  at  optimal  forgetting  factor  and  that  at  A  corresponding  to  the  optimal  sliding 
window  length  Uopt  when  SNR  is  high  is  negligible. 

4.7.3  Optimal  Value  of  Forgetting  Factor 

The  optimal  A  for  the  exponentially  weighted  LS  algorithm  is  approximated  by  us¬ 
ing  the  expression  for  the  optimal  window  length  of  the  sliding  window  RLS  algo¬ 
rithm  (4.80)  and  the  relationship  between  A  and  the  effective  number  of  observations 
n  in  the  rectangularly  windowed  SCM  (4.86).  The  justification  for  such  an  approach 
is  the  observation  that  even  at  large  SNR’s,  the  difference  between  the  MSB’s  at 
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forgetting  factor  (<->equivalent  length  of  the  sliding  window) 


Figure  4-23:  Exponentially  weighted  versus  sliding  window  RLS  for  SNR=5  dB.  The 
integers  along  the  green  curve  represent  the  sliding  window  lengths  equivalent  to  the 
corresponding  values  of  forgetting  factor. 
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Table  4.2;  Optimal  Forgetting  Factor  Value 


a 

SNR 

\  sim 
^opt 

x(i) 

^opt 

Ai[dB] 

^opt 

A2[dB] 

0.99 

0.01 

5  dB 

0.975 

0.973 

0.045 

0.977 

0.04 

0.99 

0.01 

10  dB 

0.960 

0.967 

0.065 

0.972 

0.155 

0.99 

0.01 

20  dB 

0.949 

0.962 

0.153 

0.968 

0.309 

0.99 

0.01 

40  dB 

0.942 

0.962 

0.235 

0.968 

0.375 

0.98 

1 

5  dB 

0.980 

0.969 

0.107 

0.973 

0.075 

0.98 

1 

10  dB 

0.969 

0.964 

0.074 

0.970 

0.019 

0.98 

1 

20  dB 

0.953 

0.961 

0.037 

0.968 

0.108 

0.98 

1 

40  dB 

0.956 

0.961 

0.048 

0.968 

0.137 

optimal  forgetting  factor  and  that  at  A  corresponding  to  the  optimal  sliding  window 
length  rZopt  is  negligible.  Overall  from  (4.80)  and  (4.86), 

A‘‘>  =  1 - /  (4.88) 

m+  V - 2“^ 

A  potentially  more  accurate  expression  for  the  optimal  forgetting  factor  Agp).  is 
obtained  by  substituting  n  =  nopt,  evaluated  using  (4.80),  into  the  exact  expres¬ 
sion  (4.87). 

The  derived  expressions  are  tested  by  comparing  A^pt  and  A^t  with  the  optimal 
forgetting  factor  obtained  via  simulations  for  given  system  parameters.  The  results 
are  summarized  in  Table  4.2.  The  channel  length  in  all  cases  is  30.  The  quantities 
Ai  and  A2  represent  the  performance  loss  due  to  respectively  choosing  theoretically 
calculated  values  A^pt  and  A^pl  instead  of  A®™.  The  optimal  forgetting  factor  (4.9) 
from  [27]  is  also  tested  and  it  fails  to  yield  accurate  result  (in  fact,  it  yields  too  small 
values  for  A),  which  comes  at  no  surprise  since  it  is  derived  under  the  assumption  that 
both  A  and  a  are  very  close  to  1.  As  can  be  observed  from  the  table,  the  performance 
loss  due  to  using  optimal  forgetting  factor  proposed  here  does  not  exceed  0.5  dB. 

Overall,  for  relatively  small  SNR,  both  expressions  give  accurate  results.  On  the 

other  hand,  when  SNR  is  high,  the  optimal  forgetting  factor  is  below  that  computed 

using  (4.87),  (refer  to  Fig.  4-22).  In  that  case,  A^pt  is  more  accurate  because  it  is 

(2) 

always  smaller  than  Aop(  and  hence  closer  to  the  true  optimal  value. 
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As  a  final  remark,  note  that  although  the  characterization  of  optimal  forgetting 
factor  is  derived  for  a  — )■  1,  the  simulations  show  that  it  is  fairly  accurate  for  the 
values  of  a  smaller  than  1  for  which  the  RLS  algorithm  reasonably  well  tracks  the 
corresponding  channel. 


4.8  Conclusions 

An  analysis  of  the  Least  Squares  algorithm  when  employed  to  track  a  time-varying 
channel  is  performed  using  results  and  tools  from  the  theory  of  large  dimensional 
random  matrices.  A  time-varying  channel  is  modeled  as  a  first  order  Markov  pro¬ 
cess  and  the  performance  metrics  of  interest  are  mean  square  values  of  the  channel 
estimation  and  signal  prediction  errors.  These  metrics  are  characterized  for  a  given 
number  of  observations,  channel  length  and  the  parameters  describing  the  channel 
dynamics.  The  results  borrowed  from  the  random  matrix  theory  enable  the  analysis 
which  does  not  rely  on  the  direct  averaging  assumption  and  the  assumption  that  the 
expectation  of  the  inverse  of  the  sample  correlation  matrix  is  a  scaled  inverse  of  the 
ensemble  correlation  matrix. 

The  simulation  study  validates  the  derived  analytical  expressions.  In  addition,  sev¬ 
eral  practical  results  are  revealed  by  specifying  the  general  theory  for  simpler  cases. 
First,  an  expression  for  the  optimal  window  length  in  the  sliding  window  LS  algorithm 
is  derived.  Second,  based  on  the  comparison  between  the  exponentially  weighted  and 
sliding  window  LS  algorithms,  it  is  conjectured  that  former  outperforms  the  latter,  if 
forgetting  factor  is  appropriately  selected  given  the  sliding  window  length.  The  corre¬ 
sponding  expression  for  such  a  forgetting  factor  is  derived.  Third,  a  relation  between 
forgetting  factor  used  for  calculating  an  exponentially  weighted  sample  correlation 
matrix  and  the  effective  number  of  stationary,  rectangularly  windowed  observations 
is  established.  Fourth,  this  relation  is  further  exploited  to  evaluate  the  optimal  value 
of  the  forgetting  factor  in  the  exponentially  weighted  LS  algorithm.  Finally,  the  effect 
of  performance  deterioration  appearing  when  the  number  of  observations  is  close  to 
channel  length  [i.e.,  the  number  of  dimensions)  is  observed,  theoretically  analyzed 
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and  intuitively  elaborated. 
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Chapter  5 


Channel  Eqnalization 

5.1  Introduction 

The  wireless  communication  channels  through  which  signals  are  transmitted  are  often 
time-varying  and  characterized  by  multipath  propagation.  The  multipath  propaga¬ 
tion  gives  rise  to  a  delay  spread,  resulting  in  intersymbol  interference  (ISI)  in  the 
received  signal,  while  the  time-variability  results  in  the  Doppler  spreading  of  the  sig¬ 
nal  [49].  These  effects  are  even  more  profound  in  the  setting  of  underwater  acoustic 
communications,  wherein  along  with  a  high  latency  appearing  due  to  a  relatively  slow 
speed  of  propagation  (nominally  1500  m/s),  and  frequency  dependent  attenuation  of 
the  transmitted  signal,  these  effects  pose  signihcant  challenges  to  communication  sys¬ 
tem  design  [51]. 

Different  techniques  have  been  developed  for  mitigating  these  effects  [50] .  A  survey 
of  the  approaches  used  for  the  underwater  acoustic  communication  system  design  is 
given  in  [48].  Most  techniques  rely  in  part  or  completely  on  channel  equalization  with 
a  Decision  Feedback  Equalizer  (DFE)  being  the  most  commonly  used  form  [39].  A 
multi-channel  DFE  (MC-DFE)  is  one  which  processes  the  signals  received  at  multiple 
spatially  separated  sensors.  The  MC-DFE  is  particularly  effective  at  compensating 
for  the  ISI  induced  by  the  multipath  commonly  present  in  the  underwater  acoustic 
communication  channel. 

One  of  the  main  challenges  in  optimally  conhguring  a  multi-channel  equalizer  is 
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the  choice  of  the  number  of  sensors  and  length  of  the  constituent  hlters.  Namely,  due 
to  channel  time  variability,  the  number  of  coefficients  that  can  be  adapted  over  the 
time  interval  within  which  the  channel  is  approximately  time  invariant  is  limited  such 
that  a  smaller  number  of  coefficients  might  lead  to  better  equalization  performance. 

Another  challenge  related  to  optimally  conhguring  the  mutli-channel  equalizer  is 
the  selection  of  the  separation  between  sensors.  While  the  sensors  in  a  multiple  input 
multiple  output  (MIMO)  system  need  to  be  sufficiently  apart  so  that  the  signals 
at  their  outputs  are  uncorrelated  [20],  conventional  wisdom  is  that  array  processing 
applications  require  that  sensors  be  separated  by  no  more  than  one  half  the  shortest 
wavelength  of  the  received  signals  [52].  However,  because  an  inherently  wideband 
signal  is  transmitted  through  a  sparse  underwater  acoustic  communication  channel, 
selection  of  optimal  sensor  separation  is  a  more  subtle  problem. 

Although  equalizers  have  been  in  common  use  for  a  while,  a  great  deal  of  what 
is  currently  known  is  learned  from  simulations  and  processing  of  experimental  data, 
while  the  analytical  results  appeared  relatively  recently.  As  such,  [62]  studies  the 
signal-to-noise-puls-interference  ratio  (SINK)  at  the  output  of  the  LS-based  linear 
equalizer  for  a  time  invariant  frequency  flat  fading  channel.  A  more  general  analysis 
in  [38]  characterizes  the  SINK  at  the  output  of  the  LS-based  linear  equalizer  for  time 
invariant  frequency  selective  channels.  Both  works  exploit  random  matrix  theory 
results  which,  as  elaborated  in  Section  2.7,  while  theoretically  valid  in  the  limit,  are 
fairly  accurate  in  modeling  equalizer  performance  in  practical  hnite  observation  time 
scenarios.  In  terms  of  optimal  DFE  design,  [23]  and  [24]  discuss  how  the  decision 
delay,  feed-forward  and  feedback  hlter  lengths  should  be  selected  such  that  the  signal 
prediction  MSE  of  the  MMSE  based  DFE  equalizer  of  an  LTI  channel  is  minimized. 
However,  these  works  inherently  assume  that  the  observation  period  is  inhnitely  long, 
which  is  not  the  case  in  non- stationary  channels  where  only  a  limited  number  of 
stationary  observations  are  available  for  adaptation. 

The  contributions  of  this  chapter  are  threefold.  First,  a  performance  of  least 
squares  (LS)-based  MC-DFE  equalization  method  is  theoretically  analyzed.  The  case 
of  linear  equalization  is  included  as  a  special  case.  The  transmission  channel  is  non- 


192 


stationary  and  modeled  as  a  frequency  selective  filter  which  is  time-invariant  over  only 
short  time  intervals.  A  signal  prediction  error  is  adopted  as  the  performance  metric 
and  characterization  of  its  mean  square  value  is  derived.  In  comparison  to  [38],  in 
addition  to  handling  the  case  of  DFE  equalization  and  evaluating  the  signal  prediction 
MSE,  the  analysis  technique  is  different  and  yields  a  greatly  simplihed  closed-form 
result,  which  is  exact  when  the  received  signal  is  Gaussian  distributed.  The  derived 
expression  is  validated  via  Monte-Carlo  simulations. 

The  derived  expressions  quantitatively  support  the  observed  performance  char¬ 
acteristic  that,  when  working  with  signals  that  have  passed  through  time- varying 
channels,  an  equalizer  with  relatively  short  constituent  hlters  can  outperform  one  us¬ 
ing  longer  filters  [40].  The  optimal  number  of  taps  in  the  equalizer’s  constituent  filters 
is  presented  as  a  trade-off  between  two  competing  requirements.  On  one  hand,  for  a 
perfectly  known  environment,  the  MMSE  error  criterion  is  a  non-increasing  function 
of  hlter  length.  On  the  other  hand,  the  insights  from  random  matrix  theory  imply 
that  for  a  given  number  of  observation  vectors,  the  shorter  constituent  hlters  lead  to 
more  accurate  estimate  of  the  correlation  matrix  and  therefore  improved  performance. 

Finally,  the  chapter  analyzes  how  the  number  of  and  separation  between  the  sen¬ 
sors  impacts  the  equalization  performance  in  a  time- varying  underwater  acoustic  com¬ 
munication  channel.  Our  model  for  the  arrival  process  takes  into  account  that  the 
underwater  acoustic  communication  signal  received  on  an  array  of  sensors  is  wideband 
and  spatially  spread.  The  signal  prediction  mean  square  error  (MSE)  is  evaluated 
using  this  model.  An  illustration  of  how  the  equalization  performance  depends  on  the 
number  of  sensors  and  their  separation  for  a  particular  arrival  model  is  then  provided. 
Finally,  the  bit  error  rate  (BER)  and  signal  prediction  MSE  performance,  obtained 
from  processing  experimental  data  using  a  multi-channel  equalizer  with  different  sen¬ 
sor  separations,  are  presented.  This  justihes  the  conclusion  that  the  equalization 
performance  is  optimized  for  a  non-trivial  sensor  separation. 

Although  the  problems  considered  in  this  chapter  are  mainly  motivated  by  un¬ 
derwater  acoustic  communications,  the  developed  theoretical  results  and  insights  are 
also  applicable  to  other  settings.  As  such,  the  performance  analysis  of  time  varying 
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channel  equalization  is  general  and  the  insights  hold  for  other  applications  of  least 
squares  based  equalizers.  Furthermore,  the  study  of  the  impact  of  sensor  separation 
on  the  performance  of  multi-channel  equalization  is,  for  example,  applicable  to  the 
contexts  of  increasingly  popular  60  GHz  ultra-wideband  communications  [64]  and 
optimal  receiver  design  based  on  massive  MIMO  [31]. 

The  rest  of  the  chapter  is  organized  as  follows.  A  background  on  the  MC-DFE 
which  describes  its  structure,  analytical  framework  and  updating  algorithm  is  given 
in  Section  5.2.  Section  5.3  presents  the  performance  analysis  of  the  MC-DFE  when 
adapting  using  a  limited  number  of  observations  of  the  received  signal.  Section  5.4 
argues  that  the  optimal  number  of  coefficients  which  optimize  the  equalization  per¬ 
formance  is  a  trade  off  between  two  competing  requirements.  The  model  for  sparse, 
wideband  and  spatially  spread  arrivals,  inherent  to  the  underwater  acoustic  environ¬ 
ment  is  presented  in  5.5.  The  impact  of  sensor  separation  and  array  aperture  on 
performance  of  multi-channel  equalizer  is  studied  in  Section  5.6.  Finally,  Section  5.7 
concludes  the  chapter. 

5.2  Background 

The  structure  of  the  multi-channel  decision  feedback  equalizer  (MC-DFE),  analytical 
framework  and  least  squares  based  adaptation  algorithm  are  briefly  described  in  this 
section. 

5.2.1  MC-DFE:  Structure 

The  structure  of  the  MC-DFE  equalizer  is  shown  in  Fig.  5-1.  It  contains  a  feed¬ 
forward  (FF)  hlter  bank,  feedback  (FB)  hlter  and  a  decision  device.  The  FF  hlter 
bank  consists  of  one  linear  hlter  to  process  the  input  from  each  channel  (sensor). 
A  signal  received  at  each  sensor  is  after  some  pre-processing  (such  as  conversion  to 
the  baseband)  processed  by  a  corresponding  FF  hlter.  The  ultimate  goal  of  the  FF 
processing  is  to  coherently  combine  the  received  signal  energy  and  attenuate  inter¬ 
symbol  interference  (ISI)  and  ambient  noise  signals.  On  the  other  hand,  the  linear 
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Figure  5-1:  Block  diagram  of  multi-channel  decision  feedback  equalizer. 


FB  filter  processes  the  equalizer’s  outputs  (ie.,  estimates  of  the  transmitted  symbols) 
with  the  goal  to  remove  remaining  ISI  caused  by  the  channel  and  the  FF  portion  of 
the  equalizer.  A  decision  device  produces  a  hard  estimate  of  the  transmitted  symbol 
from  a  soft-decision  estimate,  obtained  from  the  combined  FF  and  FB  filtering.  All 
the  constituent  filters  are  assumed  to  be  finite  impulse  response  (FIR)  filters. 

5.2.2  MC-DFE:  Analytical  Framework 

A  mathematical  framework  of  the  MC-DFE  is  presented  [39].  The  received  signal  is 
assumed  to  originate  from  a  single  source.  A  channel  between  the  source  and  the  i-th 
sensor  [i  =  1,2, . . . ,  N)  is  modeled  with  a  linear  filter  and  additive  noise.  In  such  a 
model,  a  symbol  Ui(n),  received  by  a  sensor  i  at  discrete  time  n  is 

Ui{n)  =  gf  (n)x(n)  H-  v{n),  (5.1) 

where  gi{n)  is  a  vector  form  of  the  Ath  channel  impulse  response  at  time  n.  The 
transmitted  symbols  that  give  rise  to  Ui{n)  are  compactly  represented  with  a  column 
vector  x(n). 

Without  loss  of  generality,  all  the  channels  are  assumed  to  have  the  same  length 
Lc-  In  addition,  we  assume  the  channels  have  the  same  lengths  of  the  causal  and 
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anti-causal  parts,  denoted  respectively  by  and  Therefore,  the  vector  x(n)  is 
formatted  as 


x(n) 


x{n  -|-  L“)  . . .  x{n)  . . .  x{n 


i'+i) 


(5.2) 


Similarly,  the  FF  filters  are  assumed  to  have  the  same  length  L^.  Also,  their 
lengths  of  the  causal  and  anti-causal  parts,  denoted  respectively  by  and  are 
equal. 

The  Ath  FF  hlter  output  at  time  n  is  driven  by  the  received  symbols  Ui{n  + 
Lg), . . . ,  Ui{n),  Ui{n  —  1), ... ,  Ui{n  —  -|-  1).  They  are  collected  in  a  column  vector 

Uj(n)  which  is  expressed  as 


Uj(n)  =  Gi(n)x(n)  -h  Vi(n),  (5.3) 

where  Gj(n)  is  the  Lff-by-(Lff  -|-  Lc  —  1)  channel  matrix,  obtained  by  appropriately 
shifting  and  stacking  gf (n  -|-  L^), ...  ,gf{n  —  L'^+  1)  into  its  rows.  The  transmitted 
symbols  x{n  +  Lq  +  L“),  . . . ,  x{n  —  L^  —  Ll  +  2)  impacting  Uj(n)  are  collected  into  a 
column  vector  x(n).  Similarly,  Vj(n)  is  a  compact  vector  representation  of  the  noise 
samples  influencing  Ui(n). 

Stacking  up  the  vectors  Ui(n),  U2(n), . . . ,  UAr(n)  into  a  column  vector,  a  signal 
vector  u(n),  which  represents  a  signal  received  at  the  equalizer’s  FF  section  at  time 
n,  is  constructed  and  given  by 

u(n)  =  G(n)x(n)  -l- v(n),  (5.4) 

where  the  noise  vector  v(n)  and  (A^Lff)-by-(Lff  -t-  —  1)  multi-channel  matrix  G(n) 

are  constructed  in  a  similar  manner  by  stacking  up  the  corresponding  constituents 
vertically. 

The  input  to  the  FB  hlter  at  time  n  is  a  sequence  of  the  equalizer’s  estimates 
x{n  —  l),...,x{n  —  Tfb),  where  Lfb  is  the  FB  hlter  length.  These  estimates  are 
collected  in  a  column  vector  x(n).  Note  that  the  index  n  corresponds  to  an  index  of 
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a  symbol  that  is  being  estimated. 


A  soft  decision  estimate  £soft(^)  is  evaluated  as 


5soft(n)  ^  w^(n)u(n), 


(5.5) 


where  win)  = 


[n) 


-wjb(n) 


is  the  equalizer’s  weight  vector 

T 

is  the  input  to  the  equal- 


(ie.,  its  impulse  response)  and  u(n)  =  u^(n)  x'^(n 

izer  at  time  n.  A  hard  decision  estimate  x{n)  is  computed  from  Xsoft(^^)  and  the 
constellation  diagram  of  the  signaling  employed  in  the  communication  scheme. 

As  a  hnal  remark,  note  that  the  overall  number  of  equalizer  coefficients  is 


m  =  NLs  +  Lfb. 


(5.6) 


5.2.3  MC-DFE:  Optimization  of  Weights 

The  equalizer  weights  w(n)  are  evaluated  with  respect  to  some  optimization  criterion. 
A  minimization  of  the  mean  square  error  (MSE)  between  the  transmitted  symbol  x{n) 
and  its  soft  decision  estimate  Xsoft(^)  is  one  of  the  most  popular  approaches,  in  which 
the  weight  vector  w(n)  is  chosen  such  that  the  signal  prediction  mean  square  error 

^(n)  =  E  [|a;(n)  -  (rsoft(n)P]  (5.7) 

is  minimized.  The  solution  to  this  optimization  problem  is  referred  to  as  the  MMSE 
receiver  given  by, 

WMMSE(n)  =  R~^(n)r(n),  (5.8) 

where  R(n)  and  r(n)  are  the  ensemble  correlation  matrix  of  the  input  signal  and  cross¬ 
correlation  vector  between  the  input  and  desired  output  signal.  They  are  evaluated 
as 


=  E  [u(n)u‘^(n)] 
=  E  [u(n)a;*(n)] . 


R(n) 

r(n) 
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(5.9) 

(5.10) 


The  signal  prediction  MSE  of  the  MMSE  hlter  wmmse  is  after  substituting  (5.8) 
into  (5.7),  given  by 


c^mmseW  =  E  -  r^(n)R(n)  ^r(n).  (5.11) 

The  ensemble  statistics  are  rarely  known  and  are  replaced  by  time-average  statis¬ 
tics.  The  cost  function  for  an  exponential  weighting  of  the  time-average  statistics 
is  [27] 

n 

C{n)  =  ^  A”“*|w'^(n)u(i)  (5.12) 

i=l 

where  A  <  1  is  a  positive  forgetting  factor  which  accommodates  the  time-variability 
of  the  channel  by  reducing  the  impact  of  past  data  which  is  less  relevant  than  current 
data  for  the  current  estimation  problem. 

A  weight  vector  which  minimizes  this  cost  function  is  evaluated  via 

w(n)  =  R~^(n)f(n),  (5.13) 

where  R(n)  and  f  (n)  are  the  exponentially  weighted  sample  correlation  matrix  (SCM) 
and  input-desired  output  cross-correlation  vector,  given  by 


n 


R(n)  =  ^A^-*u(i)u^(i) 

i=l 

(5.14) 

n 

f(n)  =  ^  A”“*u(i)a;*(i). 

i=\ 

(5.15) 

Note  that  these  quantities  are  also  introduced  in  the  context  of  channel  estimation  in 
Chapter  4. 

In  the  Recursive  Least  Squares  (RLS)  implementation  of  (5.13),  the  inverse  of  the 
SCM  is  calculated  recursively  and  thus  the  computational  requirements  are  reduced 
from  order  to  order  where  the  total  number  of  equalizer  parameters  is  given 
by  N  [27]. 

The  described  method  of  updating  equalizer  coefficients  is  referred  to  as  a  direct 
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adaptation  approach  and  is  of  our  interest  in  this  chapter.  In  comparison,  the  equal¬ 
izer  weights  can  also  be  calculated  based  upon  the  estimates  of  the  channel  impulse 
response  for  each  sensor  and  the  ensemble  statistics  of  the  observation  noise.  This 
approach  is  labeled  as  a  channel  estimate  based  equalization  and  is  studied  in  [39]. 


5.3  Performance  Analysis  of  Equalization  of  Time- 
Varying  Channels 

The  first  part  of  this  section  presents  the  theoretical  analysis  of  the  signal  prediciton 
MSE  corresponding  to  the  MC-DFE  operating  in  time- varying  channel.  The  obtained 
characterization  is  numerically  validated  via  Monte-Carlo  simulations  in  the  second 
part.  The  theoretical  analysis  is  fairly  general  and  the  results  hold  for  any  least 
squares  based  equalization  with  non-trivial  hlter  lengths. 

5.3.1  Theoretical  Analysis  of  Signal  Prediction  MSE 

The  signal  prediction  MSE  in  the  soft  decision  [i.e.,  the  summed  outputs  of  the  FF 
and  FB  hlters  in  Fig.  5-1)  is  adopted  as  the  performance  metric.  To  model  the  channel 
non-stationarity,  we  assume  the  channel  is  time  invariant  over  a  hnite  time  interval 
and  that  the  equalizer  time  averaging  window  for  the  purpose  of  calculating  equal¬ 
izer  hlter  coefficients  is  limited  to  this  time  interval  length.  These  observations  are 
processed  by  the  MC-DFE  equalizer  which  operates  in  a  training  mode.  This  means 
that  the  input  to  the  FB  hlter  are  the  true  transmitted  symbols  as  are  the  symbols 
x{i)  used  in  the  calculation  of  f(n)  in  (5.15).  The  analysis  of  training  mode  operation 
allows  the  analysis  of  the  impact  of  channel  time-variability  and  thus  limited  obser¬ 
vations  intervals  to  be  handled  in  a  clearer  manner  and  provides  useful  insights  into 
performance  trade-ohs.  Other  contributions  to  the  performance  analysis  of  equalizers 
also  assume  operation  in  the  training  mode  [39],  [38]. 

Intuitively,  when  inhnitely  many  observations  are  used  to  train  the  equalizer 
weights,  the  LS  based  MC-DFE  with  the  rectangular  window  {i.e.,  a  forgetting  fac- 
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tor  A  =  1)  approaches  the  MMSE  MC-DFE  equalizer  for  a  time-invariant  channel. 
The  impulse  response  vector  wmmse  of  the  MMSE  hlter  and  the  corresponding  sig¬ 
nal  prediction  MSE  (also  called  the  minimum  achievable  error  (MAE))  ufijyfgg,  given 
by  (5.11),  are  conditioned  on  the  impulse  response  of  the  transmission  channel.  Recall 
that  the  prediction  error  at  the  MMSE  hlter  output  is  white  and  uncorrelated  with 
the  input  [27]. 

A  crucial  point  in  our  analysis  is  the  observation  that  the  least  squares  based 
adaptation  of  the  equalizer’s  coefficients  w(n)  can  be  framed  as  a  channel  identihca- 
tion  problem,  where  an  unknown  channel  is  the  MMSE  hlter  wmmse  which  processes 
the  input  u(n)  and  whose  output  is  corrupted  with  a  white  noise  process  q{n)  of  the 
power  uf^MSE)  ^  sequence  of  transmitted  symbols  x{n)  is  produced,  i.e., 

x{n)  =  w^MSEu(n)  +  q{n).  (5.16) 

Intuitively,  as  the  number  of  observation  vectors  u(n)  grows,  the  equalizer  weight 
vector  w(n)  approaches  that  of  the  MMSE  equalizer.  Consequently,  we  view  the 
DEE  equalizer  as  an  adaptive  processor  which  is  ’’trying”  to  get  as  close  as  possible 
to  the  MMSE  processor.  Since  the  statistics  of  the  received  process  are  unknown  and 
estimated  using  time-domain  averaging,  the  DEE  equalizer  behaves  as  an  adaptive 
processor  which  is  estimating  the  MMSE  processor  based  on  the  received  signal  and 
the  desired  output.  This  concept  is  depicted  in  Fig.  5-2. 

Therefore,  a  signal  prediction  error  ^{n)  is  using  (5.5)  and  (5.16)  expressed  as 

i{n)  =  x{n)  -  Xsoit{n) 

=  e'^(n  —  l)u(n) -I- g(n),  (5.17) 

where 

e(n)  =  Wmmse  -  w(n)  (5.18) 

measures  how  far  the  estimated  equalizer  weight  vector  w(n)  is  from  the  optimal 
(MMSE)  equalizer  wmmse-  We  refer  to  it  as  an  equalizer  estimation  error. 
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Figure  5-2:  DFE  performance  analysis. 


Since  the  noise  process  q{n)  is  uncorrelated  with  the  received  signal  u(n)  and 
equalizer  estimation  error  e(n  —  1),  the  signal  prediction  MSE  is  therefore  given  by 

E  =  E  [e^{n  -  l)u(n)u^(n)e(n  -  1)]  +  u^mse-  (5-19) 

Using  the  facts  that  tr{AB}  =  tr{BA},  for  square  matrices  A  and  B,  and  that  the 
expectation  and  trace  operators  commute,  the  signal  prediction  MSE  is  further  given 
by 

E  {E  [u(n)u^(n)e(n  -  l)e^(n  -  1)]  }  +  <mse-  (5-20) 

Assuming  that  the  received  signal  u(n)  at  time  n  and  the  equalizer  estimation  error 
e(n  —  1)  at  time  n  —  1  are  uncorrelated,  the  signal  prediction  MSE  is  expressed  as 

E  =  tr  {RE  [e{n  -  l)e^(n  -  1)]  }  +  c^mmse,  (5-21) 

where  R  is  the  ensemble  correlation  matrix  of  the  received  process,  defined  in  (5.9). 
Note  that  the  hrst  term  in  (5.21)  is  the  signal  prediction  excess  error.  It  appears 
because  the  number  of  symbols  used  to  train  the  equalizer’s  coefficients  is  hnite  and 
the  channel  is  time-varying.  The  analysis  from  here  on  assumes  hnite  n  but  that 
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A  =  1. 


The  equalizer  estimation  error  e(n)  can  be  expressed  using  (5.16)  and  (5.13)  as 


e(n) 


wmmse-R  ^(n)r(n) 

n 

wmmse  -  R“^(n)  ^  u(i)  (u^(i)wMMSE  +  g*(*)) 

i=l 


=  -R  5^u(i)g*(i).  (5.22) 

i=l 

A  correlation  matrix  of  the  equalizer  estimation  error  is  using  (5.22)  evaluated  as 


E  e{n)e^ in) 


E 


R  \n)'^u{i)q*ii)'^q{j)u^{j)R  \n) 

i=l  j=l 


^2  pi 

'^MMSE-'^ 


R 


-1/ 


ni 


(5.23) 


where  the  last  equality  follows  from  the  fact  that  noise  process  q{i)  is  white  and 
uncorrelated  with  the  received  signal  u. 

Note  from  (5.22)  that  the  correlation  matrix  of  the  estimation  error  of  the  equal¬ 
izer  weights  scales  with  the  variance  of  the  MAE.  That  is,  the  more  difficult  the 
channel  (not  counting  the  dynamics),  the  more  difficult  it  is  to  adapt  and  track  the 
corresponding  equalizer.  Also,  the  more  rapidly  the  channel  fluctuates,  the  smaller 
n,  and  the  larger  the  expected  value  of  the  inverse  of  the  time-averaged  correlation 
matrix. 

To  proceed  further,  we  employ  the  results  which  characterize  the  expectation  of  the 
SCM  inverse.  If  the  received  signal  is  Gaussian,  which  happens  when  the  transmitted 
signal  is  Gaussian  itself  and/or  the  transmission  channels  g*  are  long  enough  such  that 
their  outputs  are  approximately  Gaussian  by  the  central  limit  theorem,  the  SGM  R(/c) 
is  Wishart  distributed  and  the  expectation  of  its  inverse  is  [27] 


E 


R-Vn) 


R 


-1 


n  —  m 


1’ 


(5.24) 


where  m  =  NLg  +  Lfb  is  the  overall  number  of  equalizer  coefficients. 


202 


For  a  general  non-Gaussian  case,  the  expectation  of  the  SCM  inverse  is  evaluated 
using  the  random  matrix  theory  results.  Namely,  it  is  shown  in  Section  2.5.2  that  if  the 
order  of  the  SCM  m  and  the  number  of  observations  n  grow  large  at  the  same  rate  such 
that  m/n  ^  c  G  (0, 1),  the  inverse  of  the  rectangularly  windowed  SCM  almost  surely 
converges  to  the  scaled  inverse  of  the  ensemble  correlation  matrix  (2.54).  Noting  that 
the  SCM  R  as  dehned  in  (5.14)  with  A  =  1  is  a  scaled  version  of  the  matrix  model 
considered  in  Section  2.5.2,  the  convergence  result  (2.54)  is  rewritten  as 
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As  elaborated  in  Section  2.7,  the  expected  inverse  of  the  SCM  is  approximated 
with  the  limiting  quantity  in  (5.25)  such  that 
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Finally,  substituting  (5.24)  into  (5.23)  and  the  obtained  result  into  (5.21)  yields  the 
exact  expression  for  the  signal  prediction  MSE  when  the  received  signal  is  Gaussian, 


E0e(n)P] 
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n  —  (iVLff  +  Lfb) 


2'^MMSE) 


(5.27) 


which  is  valid  when  n  >  NLq  +  Lfb  +  2.  Note  that  this  constraint  comes  from  (5.24) 
and  carries  no  practical  insights. 


Similarly,  substituting  (5.26)  into  (5.23)  and  the  obtained  result  into  (5.21)  yields 
an  approximation  for  the  signal  prediction  MSE  when  the  received  signal  has  general 
statistics. 
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n  —  (iVLff  +  Lfb)  —  1 

which  is  valid  for  n  >  NL^  +  Lfb  +  1.  Note  that  (5.28)  also  approximates  Gaussian 
case. 


The  derived  characterization  reveals  that  the  signal  prediction  MSE  at  the  LS- 
based  MC-DFE  output  is  proportional  to  the  signal  prediction  MSE  at  the  MMSE 
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equalizer  output  (ie.,  the  minimal  achievable  error,  u^jyjgE),  and  the  proportional¬ 
ity  constant  does  not  depend  on  the  channel  impulse  response.  Note  however  that 
(J^mse  depend  on  the  noise  and  channel  characteristics  as  well  as  the  lengths  of 
the  equalizer’s  feedforward  and  feedback  hlters.  A  similar  relationship  between  the 
sink’s  at  the  outputs  of  the  linear  equalizer  and  its  corresponding  MMSE  equalizer 
is  obtained  in  [38]. 

5.3.2  Numerical  Validation 

The  derived  characterization  for  the  signal  prediction  MSE  is  validated  via  simula¬ 
tions.  In  the  hrst  example  we  consider  a  single  LTI  channel  whose  impulse  response 
is  shown  in  Fig.  5-3.  The  transmitted  symbols  are  binary  —  1}  with  uniform  dis¬ 

tribution.  They  are  transmitted  through  the  considered  channel  and  noise  is  added 
to  the  obtained  symbols.  The  Gaussian  distributed  noise  is  non-white  and  generated 
such  that  its  correlation  properties  correspond  to  the  ambient  ocean  noise  [12].  The 
received  symbols  are  processed  by  a  single  channel  DEE  equalizer. 

The  comparison  between  the  simulated  and  theoretically  computed  signal  predic¬ 
tion  MSE  when  the  received  SNR  is  10  dB  and  the  DEE  equalizer  has  22  taps  in  the 
FF  hlter  and  20  taps  in  the  FB  hlter  is  shown  in  Fig.  5-4.  As  can  be  observed,  the 
derived  characterization  (5.28)  accurately  predicts  the  MSE  performance. 

In  the  second  example,  we  consider  the  transmission  of  a  binary  sequence  {+1,  —1} 
through  a  5-channel  LTI  transmission  system.  The  impulse  response  of  one  of  the 
channels  in  shown  in  Fig.  5-3.  The  impulse  responses  of  other  channels  are  obtained 
by  randomly  perturbing  the  impulse  response  shown  in  Fig.  5-3.  The  additive  noise  is 
Gaussian  distributed  and  generated  such  that  its  power  density  spectrum  corresponds 
to  the  ambient  ocean  noise  [12].  The  received  symbols  are  processed  by  a  5-channel 
DEE  equalizer. 

The  comparison  between  the  simulated  and  theoretically  computed  signal  predic¬ 
tion  MSE  when  the  received  SNR  is  10  dB  and  the  DEE  equalizer  has  10  taps  in  each 
sensor  FF  hlter  and  10  taps  in  the  FB  hlter  is  shown  in  Fig.  5-5.  As  can  be  observed, 
the  derived  characterization  (5.28)  accurately  predicts  the  MSE  performance. 
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Figure  5-3:  Channel  impulse  response  used  in  simulations. 


Figure  5-4:  Signal  prediction  MSE  versus  number  of  observations  for  DFE  equalizer 
with  22  taps  in  FF  and  20  taps  in  FB  hlter.  The  transmission  channel  has  length  50 
and  its  impulse  response  is  shown  in  Fig.  5-3.  The  noise  is  colored  (ocean  ambient 
noise)  and  the  received  SNR  is  10  dB. 
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Figure  5-5:  Signal  prediction  MSE  versus  number  of  observations  for  5-channel  DFE 
equalizer  with  10  taps  in  each  FF  and  10  taps  in  FB  hlter.  The  noise  is  colored  (ocean 
ambient  noise)  and  the  received  SNR  is  10  dB. 

5.4  Equalizer  Design:  Optimal  Number  of  Coeffi¬ 
cients 

This  section  considers  a  problem  of  determining  an  optimal  number  of  coefficients  in  a 
multi-channel  DFE  equalizer  which  optimizes  the  signal  prediction  MSE.  We  assume 
the  number  of  sensors  (channels)  of  a  multi-channel  equalizer  is  hxed  and  analyze 
how  the  number  of  coefficients  contained  in  the  equalizer’s  constituent  hlters  impacts 
the  signal  prediction  performance. 

5.4.1  Insights  into  the  Expression  for  Signal  Prediction  MSE 

The  insights  into  effects  that  impact  the  optimal  number  of  coefficients  which  min¬ 
imizes  the  signal  prediction  MSE  could  be  gained  from  the  characterization  of  the 
signal  prediction  MSE  (5.28). 

Namely,  the  constant  of  proportionality  in  (5.28)  is  an  increasing  function  of  the 
overall  number  of  equalizer  weights  m  when  the  number  of  observations  n  is  hxed.  On 
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the  other  hand,  the  MAE  can  be  shown  to  be  a  monotonically  non-increasing 

function  of  the  equalizer  length  m.  Therefore,  the  signal  prediction  MSE  is  minimized 
at  a  point  where  these  two  competing  effects  are  balanced. 

In  other  words,  the  optimal  equalizer  is  a  trade  off  between  the  MMSE  require¬ 
ments  and  those  justihed  by  the  RMT  insights.  From  the  point  of  view  of  minimizing 
the  MAE,  the  fact  that  the  channel  has  a  hnite  impulse  response  indicates  that  the 
best  hlter  should  be  an  HR  hlter.  However,  the  best  estimate  of  the  sample  correlation 
matrix  R(n)  when  the  number  of  stationary  observations,  n,  is  fixed  (and  controlled 
by  the  channel)  is  achieved  if  the  order  of  the  SCM,  m,  is  1,  i.e.,  if  there  is  only  one 
equalizer  coefficient. 

To  illustrate  how  the  number  of  stationary  observations,  n,  impacts  the  quality  of 
the  SCM,  we  recall  the  Marcenko-Pastur  law  from  Section  2.5.3.  As  shown  in  Fig.  2-1, 
the  spread  of  the  eigenvalues  of  the  SCM  corresponding  to  observations  of  zero  mean, 
unit  power  white  noise  around  the  ensemble  eigenvalue  depends  on  the  parameter  c, 
whose  inverse  represents  the  number  of  observations  per  degree  of  freedom.  Thus,  as 
the  value  of  parameter  c  decreases,  the  eigenvalues  of  the  SCM  concentrate  around 
the  ensemble  eigenvalue  and  hence,  the  SCM  more  accurately  estimates  the  ensemble 
correlation  matrix.  The  same  reasoning  applies  when  the  input  process  is  colored. 
That  is,  the  ensemble  correlation  matrix  is  not  the  scaled  identity  matrix.  Again, 
the  eigenvalues  of  the  SCM  are  spread  around  their  ensemble  counterparts  and  as 
the  number  of  observations  per  dimension,  1/c,  increases,  the  sample  eigenvalues 
concentrate  around  ensemble  eigenvalues. 

Another  insightful  characterization  of  the  signal  prediction  MSE  is  obtained  by 
representing  it  as  a  sum  of  the  MAE  and  excess  error  as  in  (5.20).  Namely,  (5.28)  is 
equivalently  expressed  as 


TTl 

E  [|'C(^)P]  ~  '^MMSE  +  ^  ^  _  j^'^MMSE-  (5.29) 

Note  that  the  excess  error  is  viewed  as  a  product  of  the  factor  which  depends  only 
on  the  number  of  equalizer  weights  m  and  averaging  interval  n  and  the  MAE  (T^mse) 
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which  is  a  function  of  the  channel  impulse  response  and  number  of  coefficients  m. 

The  conclusion  that  the  optimal  number  of  equalizer  coefficients  is  a  trade  off  be¬ 
tween  the  MMSE  requirements  and  those  justihed  by  the  RMT  insights  is  numerically 
illustrated  in  the  following  part. 

5.4.2  Numerical  Illustration 

In  the  first  example,  we  consider  an  LTI  channel  whose  impulse  response  is  shown  in 
Fig.  5-3.  The  received  signal  is  processed  throngh  a  single  channel  DFE  eqnalizer.  The 
signal  prediction  MSE  corresponding  to  the  MMSE  eqnalizer  coefficients  is  evalnated 
using  (5.11).  Its  dependence  on  the  number  of  taps  contained  in  the  FF  and  FB  hlters 
is  shown  in  Fig.  5-6  for  SNR  of  0  dB  (top  hgure)  and  SNR  of  10  dB  (bottom  hgure). 
As  expected,  the  signal  prediction  performance  improves  as  the  nnmber  of  eqnalizer 
coefficients  increases. 

A  trade  off  between  the  MMSE  reqnirement  for  longer  hlters  and  random  matrix 
theory  insight  which  favors  shorter  eqnalizers,  is  visnalized  by  showing  the  dependence 
of  the  signal  prediction  MSE  on  the  FF  and  FB  hlter  lengths.  This  dependence  is 
shown  in  Fig.  5-7  for  the  received  SNR  of  0  dB  (top  hgure)  and  10  dB  (bottom  hgure). 
We  assume  the  adaptation  of  eqnalizer  weights  is  performed  with  150  stationary 
symbols. 

The  nnmber  of  coefficients  in  the  constitnent  hlters  of  the  optimal  DFE  which 
minimizes  the  signal  prediction  MSE  for  diherent  SNR’s  is  shown  in  Table  5.1.  The 
channel  impulse  response  is  as  shown  in  Fig.  5-3  and  the  averaging  interval  is  150.  As 
shown  in  table,  for  a  given,  hxed  averaging  interval,  the  optimal  nnmber  of  coefficients 
increases  with  SNR.  In  addition,  the  optimal  distribntion  of  coefficients  between  the 
constitnent  hlters  with  increasing  SNR  is  snch  that  the  nnmber  of  coefficients  in  the 
FF  hlter  decreases,  while  the  nnmber  of  coefficients  in  the  FB  hlter  increases. 

Intuitively,  the  FF  hlter  is  the  only  one  that  can  attennate  the  noise.  On  the  other 
hand,  both  hlters  attennate  the  inter-symbol  interference.  However,  the  FF  hlter  can 
only  attennate  ISI  to  the  extent  that  it  has  a  diherent  temporal  signatnre  than  the 
desired  symbol.  Consequently,  more  coefficients  are  allocated  to  the  FF  hlter  at  low 
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Figure  5-6:  The  minimum  achievable  signal  prediction  MSE  in  dB  versus  number 
of  taps  in  FF  and  FB  hlter  of  the  DFE  equalizer  for  SNR=0  dB  (top  hgure)  and 
SNR=10  dB  (bottom  hgure).  The  impulse  response  of  the  transmission  channel  is 
given  in  Fig.  5-3. 


209 


SNR  =  OdB 


10  20  30  40  50  60  70 

FB  filter  length 


SNR  =  10 dB 


10  20  30  40  50  60  70 

FB  filter  length 


Figure  5-7:  Analytically  computed  signal  prediction  MSE  in  dB  versus  number  of  taps 
in  FF  and  FB  filters  of  the  DFE  equalizer  for  SNR=0  dB  (top  figure)  and  SNR=10 
dB  (bottom  figure).  The  impulse  response  of  the  transmission  channel  is  given  in 
Fig.  5-3.  The  number  of  snapshots  (~  number  of  received  symbols)  is  150. 
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Table  5.1:  Optimal  Number  of  Coefficients 


SNR 

0  dB 

5  dB 

10  dB 

12  dB 

14  dB 

15  dB 

20  dB 

25  dB 

Ls 

22 

22 

22 

16 

15 

15 

15 

15 
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28 

31 

41 

48 

48 

48 

48 

48 
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50 

53 

63 

64 

63 

63 

63 

63 

Figure  5-8:  Channel  impulse  response  at  one  of  the  sensors  in  KAMll  experiment. 

SNR  than  at  high  SNR  in  order  to  reduce  the  noise.  On  the  other  hand,  when  SNR 
gets  large,  the  ISI  is  the  dominant  source  of  distortion  so  that  the  optimal  number  of 
coefficients  in  the  FB  hlter  increases  in  order  to  eliminate  the  ISI. 

In  the  second  example  we  consider  optimal  MC-DFE  design  for  the  underwater 
acoustic  communication  channel  observed  during  the  KAMll  held  experiment  [28]. 
The  number  of  channels  (sensors)  is  5  and  a  snapshot  of  a  channel  impulse  response 
between  a  source  and  one  of  the  sensors  is  shown  in  Fig.  5-8.  The  dependence  of  the 
signal  prediction  MSE  on  the  lengths  of  FF  and  FB  hlters  for  received  SNR  of  22  dB 
and  experimentally  observed  spatially  correlated  ambient  noise  is  shown  in  Fig.  5-9. 
The  number  of  stationary  symbols  used  for  adaptation  is  500.  It  can  be  noted  that 
the  optimal  equalizer  uses  16  taps  in  each  FF  hlter  and  10  taps  in  a  FB  hlter  and 
achieves  signal  prediction  MSE  of  —7.33  dB. 
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Figure  5-9:  Signal  prediction  MSE  in  dB  versus  number  of  taps  in  FF  filter  per 
sensor  and  number  of  taps  in  FB  filter.  The  number  of  sensors  is  5,  channel  impulse 
response  and  spatially  correlated  ambient  noise  are  as  observed  in  held  experiment. 
The  received  SNR  is  22  dB  and  the  channel  coherence  time  is  500  symbols. 

5.5  Sparse  Wideband  Channel 

The  previous  section  presents  how  to  optimally  choose  the  number  of  coefficients  in 
an  adaptive  equalizer.  It  is  concluded  that  the  optimal  number  of  coefficients  is  a 
trade  off  between  two  competing  requirements.  The  presented  insights  are  illustrated 
with  the  examples  in  which  the  number  of  sensors  used  in  a  multi-channel  equalizer 
and  the  separation  between  them  is  hxed  and  predetermined. 

This  section  develops  a  theoretical  framework  suitable  for  analyzing  of  and  gaining 
insights  in  how  the  equalization  performance  depends  on  the  array  design,  he.,  the 
selection  of  the  number  of  and  separation  between  the  sensors.  We  hrst  present  the 
arrival  model  consisting  of  two  wideband  and  spatially  spread  arrivals  impinging  on 
an  array  of  sensors.  The  input  correlation  matrix  and  cross- correlation  vector  are 
then  evaluated  for  the  described  arrival  model. 

Without  loss  of  generality,  this  section  considers  linear  multi-channel  equalization. 
However,  the  model  could  be  extended  to  include  a  multi-channel  DFE. 
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5.5.1  Arrival  Model 


We  assume  that  a  single  source  is  isotropically  transmitting  a  wideband  signal  of 
power  1,  confined  within  a  frequency  range  between  ul  and  uu  rad/s.  The  one-sided 
power  spectrum  density  of  the  transmitted  signal  is  given  by^ 


T’x(ca) 


{Pq  if  ca  e  [ul.oju] 
0  otherwise. 


(5.30) 


We  assume  that  on  the  receiver  side,  a  linear  vertical  and  uniformly  spaced  array 
of  sensors  with  sensor  separation  d  is  receiving  the  transmitted  signal.  The  coordinate 
system  has  the  origin  at  the  position  of  the  top-most  sensor  and  its  z-axis  is  along 
the  array  and  oriented  downwards.  The  projection  of  the  spatial  wavenumber  vector 
k  onto  the  2:- axis  is  denoted  with  kz-  The  directional  cosine  is  dehned  as  u  =  cos (6*), 
where  6  is  the  elevation  angle  with  respect  to  the  array  such  that  6  =  90"  corresponds 
to  the  broadside  of  the  array. 

To  model  the  received  signal,  we  assume  that  the  underwater  acoustic  communi¬ 
cation  system  is  wideband  as  is  the  case  for  most  single  carrier  systems.  Also,  the 
impulse  response  of  the  underwater  acoustic  communication  channel  is  sparse.  Fur¬ 
thermore,  processing  of  data  from  a  variety  of  underwater  acoustic  communication 
experiments  shows  that  the  acoustic  energy  is  usually  conhned  to  a  limited  region  of 
the  delay-vertical  wavenumber  domain.  As  an  example,  the  distribution  of  acoustic 
energy,  received  in  the  held  experiment  (KAMll)  [28],  across  the  elevation  angle  and 
delay  domains,  averaged  over  the  time  period  of  60  seconds,  is  shown  in  Fig.  5-10. 

Therefore,  without  loss  of  generality,  we  assume  that  the  array  receives  acoustic 
energy  from  two  different  directions.  The  acoustic  energy  corresponding  to  each 
arrival  is  wideband  and  spatially  spread  in  the  angle  domain. 

To  formalize  the  model,  the  transmission  channel  is  viewed  as  a  hlter  whose  re¬ 
sponse  in  the  u  —  kz  domain  has  two  non- zero,  non-overlapping  patches.  Each  patch 

^The  power  amplifier  is  part  of  the  communication  channel  in  this  model.  Hence,  the  signal  at 
the  input  of  a  power  amplifier  is  the  transmitted  signal  in  our  model. 
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Figure  5-10:  The  distribution  of  acoustic  energy  received  in  the  held  experiment, 
averaged  over  time  period  of  60  seconds. 
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gives  rise  to  one  arrival.  We  assume  the  arrivals  come  from  non-overlapping  ranges 
[un,i-,Un,'u\,  n  =  1,2  in  the  directional  cosine  domain.  The  channel  response  is  thus 
given  by^ 


H{uj,  kz) 


hi  u  E  [ui,Uu]  and  k^{u)  E  [^Ui^i, 

<  ^  g  [ui,Uu]  and  kz{u)  E  [^U2,i,  ^U2,u] 

0  otherwise 


(5.31) 


where  hi  and  /i2  are  positive  reals  and  To{kz)  is  the  relative  delay  between  the  arrivals. 
Here  we  assume  that  the  delay  within  each  patch  is  much  smaller  than  the  relative 
delay  between  patches  such  that  To{kz)  ~  Tq. 


Given  the  channel  model,  the  frequency— wavenumber  spectrum  of  the  received 
signal,  Py{uj,  kz),  is  then  given  by 


Py{u,kz)  =  \H{u,kz)\^Px{uj)- 


(5.32) 


The  presented  arrival  model  can  be  extended  to  mimic  more  realistic  cases  which 
include  more  arrivals,  non-flat  frequency  responses  over  the  possibly  overlapping 
patches  in  the  frequency— wavenumber  domain,  as  well  as  a  more  general  shape  of  an 
array  of  sensors.  However,  for  the  purpose  of  gaining  insights  in  how  sensor  separation 
impacts  the  performance,  the  more  important  aspect  of  the  model  is  that  it  accounts 
for  wideband  and  spatially  spread  nature  of  the  received  signal. 

In  the  following  parts,  we  evaluate  the  correlation  matrix  and  cross-correlation 
vector  corresponding  to  the  considered  model. 


^We  assume  that  the  power  amplification  is  absorbed  in  the  channel  response. 
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5.5.2  Evaluation  of  Correlation  Matrix  R 


The  ensemble  correlation  matrix  R  is  obtained  from  the  space— time  correlation  func¬ 
tion  Puu*{t,Ap)  of  the  received  signal,  dehned  as 

Puu*{t,Ap)  =  E  [u(t,p)u*(t  -T,p-  Ap)] ,  (5.33) 

where  u(t,p)  is  the  continuous  signal  received  at  time  t  and  position  p  along  the 
axis.  The  sample  of  the  continuous  time  signal  u{t,p)  received  at  sensor  i  at  discrete 
time  n  is  denoted  by  Ui{n).  In  general,  the  space— time  correlation  function  describes 
the  correlation  between  the  signals  received  at  points  separated  by  vector  Ap  and  at 
time  instances  separated  by  a  delay  r.  However,  we  are  only  interested  in  correlation 
between  signals  received  by  sensors  in  an  array.  Since  the  sensors  are  aligned  along 
the  ;2-axis,  the  second  argument  in  the  space— time  correlation  function  (5.33)  is  a 
scalar  and  describes  the  ;2-coordinate  of  a  particular  sensor. 

The  correlation  function  puu*  ('r,  Ap)  is  evaluated  from  the  frequency— wavenumber 
spectrum  of  the  received  signal  and  is  given  by  [52] 

1  r‘^u 

Puu*{r,Ap)  = 

(271")  Ju!]^ 

■LUu 

Sy{u,  Ap)e~^‘^'^du,  (5.34) 

>L 

where  Sy{u,Ap)  is  the  temporal  spectrum  spatial  correlation  function.  Note  that 
since  the  ;2-axis  is  along  the  array  and  we  are  interested  in  correlations  between 
signals  received  on  an  array,  kAp  =  kzAp. 

The  temporal  spectrum  spatial  correlation  function  is  for  a  channel  response  given 
in  (5.31),  evaluated  in  a  closed  form 

1  r+i^lc 

Sy{u,Ap)  =  —  Py{u,kz)e^'^^^Pdkz 

p  .  ^ 

=  —  ^e~-^^"’"^^sinc(Afc^  ^Ap)  (5.35) 

71 
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where  sinc(a;)  =  and 

(5. 

are  the  wavenumbers  corresponding  to  respectively  the  mean  and  width  of  the  range 
of  directional  cosines  pertaining  to  each  arriving  patch. 

The  space— time  correlation  function  puu*  {t,  Ap)  is  then  obtained  by  substitut¬ 
ing  (5.35)  into  (5.34)  and  performing  integration.  The  hnal  result  can  not  be  evalu¬ 
ated  in  a  closed  form  for  a  given  model  and  is  computed  numerically. 

To  specify  the  correlation  matrix  R,  we  hrst  note  from  the  construction  of  the 
signal  vector  u  and  dehnition  (5.9),  that  the  correlation  matrix  R  is  a  block  matrix 
whose  block  in  the  position  {i,j)  is 

Rjj  =  E  [ujuf  ]  ,  (5.37) 

where  i,  j  =  1, . . .  ,m.  Recalling  that  the  observation  vector  u*  collects  all  the  samples 
of  the  received  signal  at  sensor  i  which  impact  the  output  of  the  associated  hlter  at 
a  particular  time  instant,  the  element  in  the  position  {t,  s)  of  Rj^  is  given  by 

Rii  =  p{{t  -  s)Ts,{i  -  j)d) ,  (5.38) 

J  t,S 

where  d  is  the  sensor  separation  (he.,  the  sampling  interval  in  spatial  domain)  and 
Ts  is  the  sampling  interval  in  time  domain. 

5.5.3  Evaluation  of  Cross-Correlation  Vector  r 

The  cross-correlation  vector  r  is  obtained  from  the  cross-correlation  function  p^x*  {t,  p), 
dehned  as 

Pux*  (t,  p)  =  E  [u{t,  p)x* {t  -  r)]  ,  (5.39) 

where  p  is  the  position  of  the  considered  sensor  along  the  ;2-axis. 

Recall  that  the  cross-correlation  function  between  the  input  and  output  from  a 
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linear  time  invariant  filter  is  given  as  the  inverse  Fourier  transform  of  the  cross- 
spectral  density.  The  cross-spectral  density  is  evaluated  as  a  product  of  the  channel 
frequency  response  and  power  spectral  density  of  the  input  process  [26] .  Analogously, 
the  cross-correlation  function  p^x*  ('r,  p)  is  evaluated  as 

1  r‘^u 

Pux*{r,p)  =  -p^  /  (5.40) 

J—u}/c 

Since  we  are  interested  in  evaluating  the  statistics  of  the  signals  received  on  the  array, 
p  is  a  (scalar)  distance  between  the  origin  and  a  particular  sensor. 

Denoting  with  Sux*{(^,p)  the  solution  to  the  integration  in  (5.40)  over  kz,  the 
cross-correlation  function  is  expressed  as 

Pux*{r,p)  =  7r  Sux*{‘^,p)e~^‘^'"du.  (5.41) 


The  function  Sux*{^,p)  is  for  the  considered  model  (5.31)  evaluated  in  a  closed 
form  and  given  by 

Sux*(pJ,p)  =  TT  i  Px{u),kz)H{u},kz)e~^’'^Pdkz 

J  —ujjc 

P  .  ^ 

=  —  y'h„A/C;,,„e"'^''"’"^sinc(AA;;,,„p),  (5.42) 

71  ^ ^ 

n=l 

where  Akz,n  and  kz,n  are  as  dehned  in  (5.36). 


The  cross-correlation  function  pux*  {t,  p)  is  hnally  obtained  by  substituting  (5.42) 
into  (5.41)  and  performing  integration.  The  final  result  is  for  a  given  model  computed 
numerically. 


To  specify  the  cross-correlation  vector  r,  we  first  note  that  it  has  a  block  structure 
whose  f-th  block  f  j  is  the  cross-correlation  vector  between  the  observation  vector  u* 
and  the  transmitted  signal  x.  Since  Uj  collects  the  time  samples  of  the  signal  received 
at  sensor  i  which  impact  the  output  of  the  associated  hlter  at  a  particular  time  instant. 
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the  s-th  element  in  f  j  is  thus  given  by 


[r*]^  =  Pux*  {sTs,id) ,  (5.43) 

where  d  and  Tg  are  the  sampling  intervals  in,  respectively,  spacial  and  delay  domains. 

5.6  Equalizer  Design:  Optimal  Array  Selection 

This  section  illustrates  how  the  number  of  sensors  and  their  separation  impact  the 
signal  prediction  performance  of  the  MMSE  and  LS  adapted  multi-channel  equalizer 
for  a  particular  arrival  structure.  The  last  part  presents  experimental  results  which 
validate  that  the  equalization  performance  is  optimized  for  a  hnite  sensor  separation 
which  is  not  necessarily  equal  to  one  half  the  shortest  wavelength. 

The  dependence  of  signal  prediction  MSE  on  the  number  of  sensors  and  their 
separation  is  examined  by  using  (5.11)  and  (5.28)  for  a  particular  arrival  structure. 
The  theoretical  framework  presented  in  the  previous  section  is  used  to  evaluate  the 
correlation  matrix  R  and  cross-correlation  vector  r  corresponding  to  the  considered 
arrival  structure. 

5.6.1  Optimal  Sensor  Separation 

One  of  the  challenges  related  to  optimally  conhguring  the  multi-channel  equalizer  is 
the  selection  of  separation  between  sensors.  While  the  sensors  in  a  MIMO  system 
whose  channel  is  characterized  by  rich  scattering  need  to  be  sufficiently  apart  so  that 
the  signals  at  their  outputs  are  uncorrelated  [20],  conventional  wisdom  is  that  array 
processing  applications  require  that  sensors  be  separated  by  no  more  than  one  half 
the  shortest  wavelength  of  the  received  signals  [52].  However,  because  an  inherently 
wideband  signal  is  transmitted  through  a  sparse  underwater  acoustic  communication 
channel,  selection  of  optimal  sensor  separation  is  a  more  subtle  problem. 

As  an  illustration,  we  consider  a  particular  arrival  model  whose  bandwidth  is 
between  9  kHz  and  17  kHz,  motivated  by  the  KAMll  experiment  [28].  For  a  sound 
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speed  of  1535  m/s,  as  observed  in  some  of  the  KAMll  data  epochs,  the  corresponding 
wavelengths  are  between  9.03  cm  and  17.06  cm. 

Here  we  consider  an  arrival  model  for  which  the  acoustic  energy  arrives  from 
the  ranges  of  elevation  angles  of  [89",  91"]  and  [84°,  86°].  The  corresponding  channel 
responses  are  equal,  he.,  hi  =  /i2,  and  the  relative  delay  between  the  arrivals  is  much 
larger  than  the  delay  within  each  patch  and  is  equal  to  two  sampling  periods,  he.. 
To  =  2/Fs,  where  Fg  =  40  kHz  is  the  sampling  frequency.  Note  that  for  a  given  signal 
bandwidth  of  8  kHz,  the  arrivals  are  not  resolvable  in  the  delay  domain.  The  ambient 
noise  is  directional  and  has  power  1.  It  is  conhned  within  the  range  of  elevation 
angles  between  80°  and  100°.  The  signal-to-noise  ratio  (SNR)  is  10  dB.  The  linear 
multi-channel  equalizer  contains  10  sensors. 

The  dependence  of  the  signal  prediction  MSE  on  sensor  separation  when  the  statis¬ 
tics  of  the  input  process  are  known  such  that  the  equalizer  coefficients  are  evaluated 
using  (5.8)  and  the  equalizer  has  1  tap  in  each  sensor  hlter  is  shown  in  Fig.  5-11.  The 
case  when  each  sensor  hlter  contains  5  taps  is  shown  in  Fig.  5-12.  As  can  be  observed, 
the  performance  is  optimized  for  a  hnite  sensor  separation  which  is  greater  than  one 
half  the  shortest  wavelength  of  the  signals  of  interest. 

When  the  statistics  of  the  input  process  are  unknown  and  estimated  from  the 
received  data,  the  signal  prediction  MSE  is  proportional  to  the  signal  prediction  MSE 
corresponding  to  the  MMSE  processor,  as  given  by  (5.28).  Hence,  the  dependence  of 
the  signal  prediction  MSE  on  sensor  separation  for  the  considered  arrival  model  and 
multi-channel  equalizer  is  as  shown  in  Figures  5-11  and  5-12  up  to  an  appropriate 
scaling  of  the  vertical  axes.  The  scaling  depends  on  the  averaging  window  size  and 
overall  number  of  coefficients. 

Finally,  as  an  illustration,  the  dependence  of  signal  prediction  MSE  on  the  number 
of  sensors  and  their  separation  when  the  equalizer  weights  are  computed  using  the 
least  squares  solution  (5.13)  and  the  channel  is  non- stationary  and  approximately 
time-invariant  over  500  symbols  is  shown  in  Fig  5-13.  The  signal  prediction  MSE  is 
evaluated  using  (5.28).  Under  the  constraint  that  each  sensor  hlter  has  5  taps,  the 
signal  prediction  performance  is  optimized  when  equalizer  contains  12  sensors  and 
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Figure  5-11:  Signal  prediction  MSE  versus  sensor  separation  for  MMSE  equalizer. 
The  sensor  separation  in  normalized  with  the  half-a-wavelength  corresponding  to  the 
highest  frequency  of  interest.  Each  of  the  10  single  sensor  feedfoward  hlters  has  a 
length  of  1  tap. 


Figure  5-12:  Signal  prediction  MSE  versus  sensor  separation  for  MMSE  equalizer. 
The  sensor  separation  in  normalized  with  the  half-a-wavelength  corresponding  to  the 
highest  frequency  of  interest.  Each  of  the  10  single  sensor  feedfoward  hlters  has  a 
length  of  5  taps. 
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Figure  5-13:  Signal  prediction  MSE  versus  number  of  sensors  and  sensor  separation 
for  LS  equalizer.  The  sensor  separation  in  normalized  with  the  half-a-wavelength 
corresponding  to  the  highest  frequency  of  interest.  The  dashed  lines  are  curves  of 
constant  aperture. 


their  separation  is  almost  6Amin,  where  Amin  =  9.03  cm.  Note  that  the  Note  the  lines 
of  constant  aperture  shown  in  Fig.  5-13.  This  effect  is  examined  in  the  following 
section. 


5.6.2  Optimal  Array  Aperture 

The  dependence  of  optimal  sensor  separation  on  the  number  of  sensors  and  number 
of  taps  per  sensor  filter  is  illustrated  here  using  the  same  arrival  model  as  considered 
in  the  previous  section.  The  only  distinction  is  that  the  relative  delay  between  two 
arrivals  is  tq  =  5/ Fg  so  that  longer  sensor  hlters  are  needed  in  order  to  suppress  the 
interference  in  the  delay  domain.  Note  that  the  arrivals  are  not  resolvable  in  the  delay 
domain. 

The  dependence  of  optimal  sensor  separation,  normalized  with  half  the  shortest 
wavelength  of  the  received  signal,  on  the  number  of  sensors  when  each  sensor  filter 
contains  L  =  7  taps  is  shown  in  Fig.  5-14.  The  statistics  of  the  input  signal  is  assumed 
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Figure  5-14:  Optimal  sensor  separation  versus  number  of  sensors.  Each  sensor  feed¬ 
forward  filter  has  7  taps.  Each  point  on  the  blue  curve  optimizes  the  signal  prediction 
MSE  in  the  theoretical  model.  The  red  curve  is  the  line  of  constant  aperture,  obtained 
from  averaging  the  apertures  corresponding  to  each  point  on  the  blue  curve. 

known  and  the  optimal  sensor  separation  optimizes  (5.11).  The  input  correlation 
matrix  R  and  cross-correlation  vector  r  for  the  considered  model  are  evaluated  as 
described  in  Section  5.5. 

As  shown  in  Fig.  5-14,  the  optimal  sensor  separation  decreases  as  the  number  of 
sensors  increases  in  such  a  way  that  the  array  aperture  is  approximately  constant. 
This  is  conhrmed  by  plotting  the  constant  aperture  curve  in  Fig.  5-14,  where  the 
constant  aperture  is  obtained  from  averaging  the  optimal  apertures  evaluated  for  the 
considered  numbers  of  sensors. 

The  dependence  of  optimal  sensor  separation  on  the  number  of  sensors  N  and 
number  of  taps  per  sensor  hlter  L  in  the  MMSE  processor  is  shown  in  Fig.  5-15.  As  can 
be  observed  the  optimal  array  aperture  is  approximately  constant  and  independent  of 
L  when  L  >  6.  Recall  that  the  relative  delay  between  the  arrivals  in  the  considered 
example  is  5/Fg  and  the  taps  in  a  sensor  hlter  are  separated  by  l/Fg.  Therefore,  the 
sensor  hlters  containing  L  >  6  taps  extend  over  both  arrivals  in  the  delay  domain 
and  the  processor  reasonably  well  suppresses  the  interference. 
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Figure  5-15:  Optimal  sensor  separation  as  a  function  of  the  number  of  sensors  and 
number  of  taps  per  sensor  filter. 

On  the  other  hand,  when  L  <  5,  the  optimal  sensor  separation  tends  to  increase 
as  the  length  of  sensor  filters  L  decreases.  In  other  words,  given  that  the  suppression 
capability  is  fairly  limited  in  the  delay  domain  due  to  short  sensor  filters,  the  aperture 
of  the  optimal  processor  increases  so  that  the  interference  is  better  suppressed  in  the 
spatial  domain.  In  addition,  when  the  number  of  sensors  N  is  not  unreasonably  small 
(he.,  N  is  above  4  in  this  example),  the  optimal  aperture  remains  approximately 
constant  and  depends  on  L. 

The  behavior  of  optimal  aperture  remains  the  same  when  equalizer  coefficients  are 
evaluated  using  the  LS  algorithm.  Namely,  recall  that  the  signal  prediction  MSE  of 
the  LS  based  equalizer  is  proportional  to  the  signal  prediction  MSE  corresponding  to 
the  MMSE  processor.  The  constant  of  proportionality  is  a  function  of  the  number  of 
coefficients  and  observation  window  length.  Therefore,  the  optimal  sensor  separation 
which  minimizes  the  signal  prediction  MSE  for  a  given  number  of  sensors  N  and 
number  of  taps  per  sensor  filter  L  is  the  same  for  the  MMSE  and  LS  based  processors. 


224 


Finally,  we  conclude  that  the  optimal  aperture  is  determined  by  the  arrival  struc¬ 
ture  of  the  received  signal.  As  long  as  the  arrival  structure  remains  unchanged  during 
the  observation  period,  the  length  of  the  observation  interval  does  not  impact  the 
aperture  of  the  optimal  multi-channel  processor. 

5.6.3  Experimental  Evidence 

For  experimental  evidence  that  sensor  separation  is  an  important  factor  in  determin¬ 
ing  equalizer  performance,  the  following  results  obtained  from  processing  the  data 
collected  in  KAMll  held  experiment  [28]  are  useful.  The  underwater  acoustic  com¬ 
munication  signal  received  at  a  vertical  linear  and  uniformly  spaced  array  is  processed 
through  a  linear  multi-channel  equalizer  and  the  transmitted  symbols  are  detected. 
The  equalizer  contains  4  sensors  and  each  sensor  hlter  has  20  taps.  The  equalizer 
coefficients  are  adapted  using  the  recursive  least  squares  (RLS)  algorithm  using  the 
forgetting  factor  A  which  yields  the  best  bit  error  rate  performance.  The  measured 
signal  prediction  MSE  and  BER  performance  versus  sensor  separation  is  shown  in 
Fig.  5-16  in  respectively  top  and  bottom  part.  Note  that  the  best  signal  prediction 
MSE  and  BER  performance  is  achieved  for  d  =  30  cm.  The  minimal  and  maximal 
wavelengths  corresponding  to  the  signal  of  interest  in  the  KAMll  experiment  are  re¬ 
spectively  9.03  cm  and  17.06  cm.  Note  that  the  optimal  separation  is  approximately 
6 Amin/2,  which  corresponds  to  the  optimal  sensor  separation  obtained  for  the  arrival 
structure  considered  in  the  previous  part  (refer  to  Fig.  5-12).  As  a  hnal  remark, 
the  linear  equalizer  is  implemented  in  time  domain  and  no  Doppler  compensation  is 
employed. 

5.7  Conclusions 

The  performance  analysis  and  optimal  design  of  multi-channel  equalizers  of  time- 
varying  channels  is  presented  in  this  chapter.  An  analytical  characterization  of  the 
signal  prediction  MSE  achieved  at  the  output  of  the  multi-channel  DFE  equalizer 
operating  in  a  non-stationary  channel  is  hrst  presented.  The  channel  is  modeled  as 
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sensor  separation  in  cm 


Figure  5-16:  Experimentally  measured  signal  prediction  MSE  (top  figure)  and  BER 
(bottom  figure)  versus  sensor  separation  for  a  4-sensor  linear  equalizer  with  20  taps 
per  sensor  filter. 
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a  frequency  selective  filter  which  is  time  invariant  over  short  time  intervals  such  that 
the  number  of  stationary  observations  used  to  adapt  the  equalizer  weights  is  hnite 
and  limited.  The  equalization  problem  is  framed  as  a  channel  identihcation  problem 
where  an  ’’unknown”  channel  is  the  MMSE  equalizer.  It  is  shown  that  the  signal 
prediction  MSE  at  the  output  of  the  LS-based  MC-DFE  equalizer  is  proportional  to 
the  signal  prediction  MSE  at  the  output  of  the  corresponding  MMSE  equalizer  and 
the  proportionality  constant  does  not  depend  on  the  channel  impulse  response.  The 
derived  expression  is  validated  via  Monte-Carlo  simulations. 

Further,  the  chapter  studies  the  problem  of  optimal  equalizer  design.  Namely, 
the  obtained  characterization  of  the  equalization  performance  highlights  that  the 
optimal  number  of  equalizer  coefficients  is  a  trade  off  between  the  MMSE  requirement 
for  longer  hlters  and  the  insight  that  finite  number  of  observations  can  effectively 
adapt  only  a  limited  number  of  equalizer  weights.  This  is  validated  via  Monte-Carlo 
simulations  and  processing  of  experimentally  collected  data. 

Finally,  an  analysis  of  how  performance  of  multi-channel  equalizer  of  time- varying 
underwater  acoustic  communication  channels  depends  on  the  number  of  sensors  and 
the  separation  between  them  is  presented.  While  the  sensors  in  a  conventional  MIMO 
system  need  to  be  sufficiently  apart  so  that  the  signals  at  their  outputs  are  uncorre¬ 
lated,  conventional  wisdom  is  that  array  processing  applications  require  that  sensors 
be  separated  by  no  more  than  one  half  the  shortest  wavelength  of  the  received  signals. 
However,  the  selection  of  optimal  sensor  separation  is  a  more  subtle  problem  in  the 
context  of  underwater  acoustic  communications.  To  study  this  problem,  we  intro¬ 
duce  an  arrival  model  which  accounts  for  wideband  and  spatially  spread  nature  of 
the  received  underwater  acoustic  communication  signals.  Using  a  particular  arrival 
structure  we  show  that  the  performance  is  optimized  for  a  non-trivial  selection  of 
the  number  of  sensors  and  their  separation.  Finally,  these  Endings  are  confirmed  by 
processing  the  experimental  data. 

As  a  possible  future  research,  the  presented  arrival  model  could  be  extended  such 
that  the  dependence  of  optimal  number  of  sensors  on  total  wavenumber  spread  and 
wavenumber  spread  of  the  arrivals  we  aim  to  separate,  is  revealed.  Also,  the  ar- 
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rival  model  could  be  generalized  to  mimic  more  realistic  arrival  structures  so  as  to 
include  more  arrivals,  non-flat  frequency  response  over  the  patches  in  the  frequency- 
wavenumber  domain  and  to  account  for  non-negligible  delay  within  the  arrival.  A 
possible  application  of  the  developed  results  and  insights  might  be  in  the  design  of 
subarray-based  equalization  method  for  rapidly  varying  communication  channels. 

Although  the  emphasis  in  this  chapter  is  on  underwater  acoustic  communications, 
the  derived  characterizations  and  gained  insights  are  applicable  in  other  settings.  As 
such,  the  derived  characterization  of  equalization  performance  holds  in  other  appli¬ 
cations  that  rely  on  least  squares  based  adaptation  in  the  deficient  sample  support 
regime  using  non-trivial  constituent  Alter  lengths. 

Another  potential  application  area  is  ultra-wide  band  communications  and,  in 
particular,  an  increasingly  popular  60  GHz  communications.  Namely,  since  an  under¬ 
water  acoustic  communication  channel  is  wideband,  it  shares  some  features  with  an 
ultra-wideband  radio  communication  channel.  Specifically,  both  channels  are  sparse 
and  the  received  signals  consist  of  a  limited  number  of  wideband  and  spatially  spread 
arrivals.  In  addition,  given  the  fact  that  the  receivers  in  ultra-wideband  communi¬ 
cation  systems  are  envisioned  to  contain  many  antennas  (i.e.,  massive  MIMO),  the 
optimal  receiver  design  may  benefit  from  our  analysis  and  the  insights  in  how  sensor 
separation  impacts  the  equalization  performance. 
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Chapter  6 


Conclusions  and  Future  Work 


The  coefficients  in  adaptive  processors  are  often  obtained  as  a  solution  to  an  opti¬ 
mization  problem  whose  objective  function  is  based  on  the  second  order  statistics  of 
the  input  process.  Second  order  statistics  are  usually  used  because  they  arise  natu¬ 
rally  in  squared  error  criteria  involving  linear  signal  and  processing  models  and  the 
estimation  of  higher  order  statistics  is  often  difficult  due  to  limited  data  with  which 
to  estimate  statistics.  In  addition,  adaptive  processing  with  higher  order  statistics 
requires  more  computations,  which  is  usually  a  scarce  resource. 

The  adaptive  processor  weights,  obtained  as  a  solution  to  an  optimization  problem 
with  objective  function  based  on  second  order  statistics,  depends  on  the  ensemble 
correlation  matrix  of  the  input  process.  However,  the  ensemble  correlation  matrix 
is  often  unknown  and  is  estimated  from  the  observed  data.  The  sample  correlation 
matrix  (SCM)  is  a  widely  used  estimator  or  building  block  in  other  estimators  of  the 
ensemble  correlation  matrix.  The  SCM  accurately  estimates  the  ensemble  correlation 
matrix  when  the  number  of  observations  is  sufficiently  many  times  larger  than  the 
number  of  coefficients,  which  is  rarely  the  case  in  practice. 

The  SCM  evaluated  from  dehcient  sample  support  might  signihcantly  differ  from 
the  ensemble  correlation  matrix.  The  problem  of  dehcient  sample  support  arises  as  a 
result  of  one  or  more  of  the  following  reasons.  First,  the  statistics  of  the  input  signal 
might  be  non-stationary  because  the  signal  has  propagated  through  a  time-varying 
environment,  such  as  in  terrestrial  wireless  communications.  Second,  the  length  of 
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the  observation  interval  that  can  be  used  to  estimate  the  time- varying  statistics  might 
not  be  sufficient,  such  as  in  medical  applications.  Finally,  the  number  of  dimensions 
might  be  very  large  so  that  the  number  of  observations  is  small  compared  to  the 
number  of  dimensions,  such  as  in  modern  radar  and  sonar  systems. 

This  thesis  studied  the  problems  associated  with  adaptive  processing  based  on 
second  order  statistics  in  the  dehcient  sample  support  regime.  The  applications  of 
adaptive  processing  considered  in  the  thesis  were  adaptive  beamforming  for  spatial 
spectrum  estimation,  tracking  of  time-varying  channels  and  equalization  of  commu¬ 
nication  channels.  More  specihcally,  the  thesis  analyzed  the  performance  of  the  con¬ 
sidered  adaptive  processors  when  operating  in  the  dehcient  sample  support  regime. 
In  addition,  it  gained  insights  into  behavior  of  different  estimators  based  on  the  es¬ 
timated  second  order  statistics.  Finally,  it  studied  how  to  optimize  the  adaptive 
processors  and  algorithms  so  as  to  account  for  dehcient  sample  support  and  conse¬ 
quently  improve  the  performance. 


6.1  Random  Matrix  Theory 

The  problems  of  adaptive  processing  in  the  dehcient  sample  support  have  been  ana¬ 
lyzed  and  addressed  by  exploiting  the  results  and  insights  from  random  matrix  theory. 
The  random  matrix  theory  is  a  mathematical  area  which  studies  how  eigenvalues  and 
eigenvectors  of  a  random  matrix  behave.  It  shows  that  certain  encodings  of  the 
eigenvalues  and  eigenvectors  of  some  random  matrix  ensembles  exhibit  deterministic 
behavior  in  the  limit  when  the  order  of  a  matrix  grows  large.  The  estimate  of  the 
ensemble  correlation  matrix,  the  SCM,  is  a  random  matrix  of  our  interest.  From  that 
hand,  this  thesis  could  be  viewed  as  a  study  of  what  random  matrix  theory  can  teach 
as  about  adaptive  processing  in  the  dehcient  sample  support  regime. 

More  specihcally,  we  have  utilized  the  characterizations  of  the  limiting  behavior 
of  eigenvalue  and  eigenvector  Stieltjes  transforms  corresponding  to  diherent  SCM 
models.  The  considered  SCM  models  involve  unity  or  non-unity  forgetting  factor 
and  have  zero  or  non-zero  diagonal  loading.  Furthermore,  the  limiting  moments 
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corresponding  to  these  SCM  models  have  been  evaluated  using  the  Stieltjes  transform 
method.  Finally,  the  Gaussian  tools  characterizing  the  expected  value  and  variance 
of  a  functional  of  Gaussian  vector  have  been  exploited  in  characterizing  the  behavior 
of  the  deviation  of  the  quadratic  forms  involving  the  inverse  of  the  SGM  from  their 
asymptotic  counterparts. 

In  the  following,  we  briefly  summarize  the  problems  considered  in  each  application 
of  adaptive  processing  and  enumerate  the  contributions. 


6.2  Spatial  Power  Spectrum  Estimation 

In  the  context  of  adaptive  spatial  power  spectrum  estimation,  we  have  focused  on 
two  commonly  used  spatial  power  spectrum  estimators  based  on  the  MPDR  beam- 
former  and  studied  how  regularization  in  the  form  of  diagonal  loading,  introduced 
with  the  goal  to  alleviate  the  problems  caused  by  dehcient  sample  support,  impacts 
the  estimation  performance.  The  following  results  have  been  obtained. 

First,  it  has  been  shown  that  both  power  estimators  almost  surely  converge  to 
deterministic  limits  when  the  number  of  sensors  and  observations  (snapshots)  grow 
large  at  the  same  rate.  Given  that  both  power  estimators  are  bounded  for  hnite  value 
of  diagonal  loading,  it  has  been  concluded  that  the  expectations  of  the  power  esti¬ 
mators  converge  to  the  same  limits.  Although  asymptotic,  as  are  the  other  results  in 
large  dimensional  random  matrix  theory,  due  to  rapid  convergence,  the  deterministic 
limits  fairly  accurately  approximate  the  expected  values  of  power  estimators  for  hnite 
and  relatively  small  number  of  sensors  and  snapshots. 

Second,  under  the  assumption  that  the  snapshots  are  Gaussian  distributed,  the 
rate  of  convergence  of  one  of  the  power  estimators  to  its  deterministic  limit,  its  vari¬ 
ance  and  estimation  mean  square  error  have  been  characterized.  In  doing  so,  the 
Gaussian  tools  which  characterize  the  expectation  and  variance  of  a  functional  of  a 
Gaussian  vector  have  been  exploited. 

Third,  a  variety  of  results  characterizing  how  the  power  estimators,  their  expec¬ 
tations  and  variances  depend  on  diagonal  loading  have  been  obtained.  We  have 
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conjectured  from  these  results  that  the  variance  has  negligible  impact  on  the  value  of 
optimal  diagonal  loading  which  minimizes  the  estimation  mean  square  error,  meaning 
that  the  optimal  diagonal  loading  is  controlled  by  the  squared  bias.  The  performance 
loss  incurred  by  using  the  diagonal  loading  which  minimizes  the  squared  bias  instead 
of  the  one  which  optimizes  the  estimation  MSE  has  been  shown  to  be  insignihcant. 

Forth,  using  the  stated  conjecture,  we  have  investigated  how  optimal  diagonal 
loading  depends  on  steering  direction  when  the  arrival  model  consists  of  plane  waves 
contaminated  with  uncorrelated  noise.  We  have  shown  that  when  steering  close  to 
the  source  direction,  the  optimal  loading  is  small  and  can  even  be  zero,  meaning  that 
the  optimal  estimator  tends  to  perform  adaptation  as  much  as  possible,  fully  relying 
on  the  available  data.  On  the  other  hand,  when  steering  direction  moves  away  from 
the  source  direction,  the  optimal  loading  tends  to  increase  and  follows  an  oscillatory 
behavior. 

Finally,  the  MSE  performances  of  the  two  power  estimators  have  been  compared 
and  it  has  been  shown  that  the  estimator  Ph  performs  better  (lower  MSE)  than  the 
estimator  Pa  at  the  expense  of  increased  sensitivity  to  optimal  diagonal  loading. 

All  obtained  results  are  validated  via  Monte-Carlo  simulations. 


6.3  Time- Varying  Channel  Tracking 

In  the  context  of  time-varying  channel  tracking  problem,  the  performance  of  the 
Recursive  Least  Squares  algorithm  when  used  to  estimate  and  track  the  channel 
whose  variations  are  modeled  with  a  first  order  Markov  process  has  been  studied  and 
the  following  results  have  been  obtained. 

First,  the  channel  estimation  and  signal  prediction  mean  square  errors  are  char¬ 
acterized  in  the  dehcient  sample  support  regime. 

Second,  the  general  results  are  applied  for  specihc  scenarios  and  as  special  cases, 
the  tracking  performance  in  the  steady-state,  performance  of  the  LS-based  identih- 
cation  of  linear  time-invariant  channel  and  performance  of  the  sliding  window  RLS 
algorithm  are  characterized. 
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Third,  several  practical  results  have  been  obtained  using  the  developed  analy¬ 
sis.  As  such,  an  expression  for  the  optimal  window  length  in  the  sliding  window 
LS  algorithm  has  been  derived.  Also,  based  on  the  comparison  between  the  expo¬ 
nentially  weighted  and  sliding  window  LS  algorithms,  it  has  been  conjectured  that 
former  outperforms  the  latter,  if  forgetting  factor  is  appropriately  selected  given  the 
sliding  window  length.  The  corresponding  expression  for  such  a  forgetting  factor  has 
been  derived.  Furthermore,  a  relation  between  forgetting  factor  used  for  calculating 
an  exponentially  weighted  sample  correlation  matrix  and  the  effective  number  of  sta¬ 
tionary,  rectangularly  windowed  observations  has  been  established.  In  addition,  this 
relation  has  been  further  exploited  to  evaluate  the  optimal  value  of  the  forgetting 
factor  in  the  exponentially  weighted  LS  algorithm.  Finally,  the  effect  of  performance 
deterioration  appearing  when  the  number  of  observations  is  close  to  channel  length 
{i.e.,  the  number  of  dimensions)  is  observed,  theoretically  analyzed  and  intuitively 
elaborated. 

All  obtained  results  are  validated  via  Monte-Carlo  simulations. 


6.4  Channel  Equalization 

In  the  context  of  time-varying  channel  equalization,  the  problem  of  optimal  config¬ 
uring  a  multi-channel  Decision  Feedback  Equalizer  whose  weights  are  evaluated  and 
adapted  using  the  least  squares  algorithm  in  the  deficient  sample  support  regime  has 
been  studied  and  the  following  results  have  been  obtained. 

First,  the  performance  of  the  least  squares  based  multi-channel  DFE  is  theoreti¬ 
cally  characterized  and  it  has  been  shown  that  the  prediction  MSE  at  the  output  of 
the  LS-based  MC-DFE  equalizer  is  proportional  to  the  signal  prediction  MSE  at  the 
output  of  the  corresponding  MMSE  equalizer  and  the  proportionality  constant  does 
not  depend  on  the  channel  impulse  response.  The  derived  expression  is  validated  via 
Monte-Carlo  simulations. 

Second,  using  the  obtained  characterization  of  the  equalization  performance  it 
has  been  highlighted  that  the  optimal  number  of  equalizer  coefficients  is  a  trade  off 
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between  the  MMSE  requirement  for  longer  filters  and  the  insight  that  hnite  number 
of  observations  can  effectively  adapt  only  a  limited  number  of  equalizer  weights.  This 
is  validated  via  Monte-Carlo  simulations  and  processing  of  experimentally  collected 
data. 

Finally,  it  has  been  shown  that  as  opposed  to  standard  MIMO  systems  and  con¬ 
ventional  wisdom  in  array  processing,  the  optimal  separation  between  the  sensors 
which  minimize  the  signal  prediction  MSE  is  a  more  subtle  problem.  To  study  it,  we 
have  introduced  an  arrival  model  which  accounts  for  wideband  and  spatially  spread 
nature  of  the  received  underwater  acoustic  communication  signals.  Using  a  particu¬ 
lar  arrival  structure  we  have  shown  that  the  performance  is  optimized  for  a  specihc 
selection  of  the  number  of  sensors  and  their  separation.  Finally,  these  Endings  have 
been  confirmed  by  the  processing  of  experimental  data. 


6.5  Future  Work 

In  terms  of  future  work,  some  of  the  results  on  optimal  diagonal  loading  for  spatial 
power  spectrum  estimation  could  be  strengthen  and  extended  in  several  possible  ways. 
In  a  shorter  term,  the  stated  conjecture  about  the  negligible  impact  of  variance  on 
the  value  of  optimal  diagonal  loading  needs  to  be  rigorously  proved.  In  addition,  a 
rigorous  sensitivity  analysis  of  power  estimators  on  optimal  diagonal  loading  is  needed 
to  complete  the  comparison  of  power  estimators.  Finally,  the  results  developed  for 
Gaussian  distributed  snapshots  could  possibly  be  extended  to  more  general  snapshot 
statistics. 

In  a  longer  term,  the  ultimate  goal  concerning  the  problem  of  diagonal  loading  for 
adaptive  beamforming  is  to  develop  a  scheme  which  determines,  in  real-time,  the  opti¬ 
mal  diagonal  loading  to  be  used  in  the  computation  of  the  beamformer’s  weights  based 
on  the  received  data  and  steering  direction.  Furthermore,  the  presented  study  on  how 
spatial  power  spectrum  estimation  depends  on  diagonal  loading  could  be  applied  to 
other  estimation  methods  which  rely  on  diagonal  loading  and  possibly  generalized  to 
other  regularization  approaches.  Finally,  the  analysis  of  other  regularization  meth- 
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ods  applied  to  adaptive  beamforming  and  spatial  power  spectrum  estimation  could 
possible  benefit  from  the  tools  and  approaches  used  in  the  thesis. 

In  the  context  of  optimal  equalizer  design,  the  presented  arrival  model  could  be 
extended  such  that  the  dependence  of  optimal  number  of  sensors  on  total  wavenumber 
spread  and  wavenumber  spread  of  the  arrivals  we  aim  to  separate,  is  revealed.  Also, 
the  arrival  model  could  be  generalized  to  mimic  more  realistic  arrival  structures  so  as 
to  include  more  arrivals,  non-flat  frequency  response  over  the  patches  in  the  frequency- 
wavenumber  domain  and  to  account  for  non-negligible  delay  within  the  arrival.  A 
possible  application  of  the  developed  results  and  insights  might  be  in  the  design  of 
subarray-based  equalization  method  for  rapidly  varying  communication  channels. 

The  derived  results  and  obtained  insights  in  the  problem  of  time-varying  channel 
equalization  could  be  possibly  used  for  addressing  problems  arising  in  other  appli¬ 
cations.  Namely,  the  derived  characterization  of  the  signal  prediction  MSE  holds  in 
other  applications  which  use  least  squares  based  adaptation  and  consequently  could 
be  applied  for  studying  issues  caused  by  dehcient  sample  support.  In  addition,  the 
model  developed  for  studying  how  the  number  of  and  separation  between  sensors 
impact  the  equalization  performance  could  be  adjusted  to  model  sparse  and  spatially 
spread  arrivals  inherent  to  signals  received  in  an  ultra-wideband  terrestrial  communi¬ 
cation  system.  An  example  is  a  communication  system  in  an  increasingly  popular  60 
GHz  frequency  band  and  possible  problems  that  might  be  tackled  using  the  developed 
arrival  model  are  equalizer  design  and  antenna  selection  in  a  massive  MIMO  receiver. 
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Appendix  A 


Proof  of  Lemma  3.5 


A  quadratic  form  Qk  can  be  written  in  terms  of  the  eigen-decomposition  of  the  SCM 
R  as 


m  I  u  ^  \2 


Due  to  assumptions  2  and  3,  dm  <  Aj  <  Dm  and  thus 


""  <  Qk(D  <  "" 


{dm  + 


(A.l) 


The  derivatives  of  the  estimators  Pa  and  Pb  are  expressed  in  terms  of  quantities 
QkS  as 


OPa 

26 

(Qs 

06 

OPb 

Q2 

06 

Combining  (A.l)  with  (A. 2)  and  (A. 3)  yields 

^  ^  26{Dm  + 

06  ~  m 


1  1 
{dm  +  {Dm  +  6D 


(A.2) 

(A.3) 


(A.4) 
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and 

1  f  L  +  sV  ^  dP,  ^  1  fp^  +  sV 

m  \D^  +  Sj  ~  96  -  m^dm  +  S  J 

which  completes  the  proof. 


(A.5) 
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Appendix  B 


Proof  of  Lemma  3.6 


The  claims  are  proved  by  using  the  Poincare-Nesh  inequality  (2.67)  wherein  /(Y) 
is  one  of  the  power  estimators  or  their  derivatives.  Since  the  received  snapshots  are 
Gaussian  distributed,  we  stick  to  the  equivalent  representation  of  the  quadratic  forms 
given  in  (3.21). 

In  proving  Lemma  3.6,  the  following  facts  are  used 

•  The  norm  of  the  resolvent  matrix  is  almost  surely  upper  bounded  by  1,  he., 

||H||  <  1.  (B.l) 

To  conhrm,  note  that  its  all  eigenvalues  are  between  zero  and  one. 

•  The  derivative  of  Hpg  with  respect  to  Y*j  is  given  by  [25] 

^  =  (B.2) 

•  For  k  >  I, 

^  <  t^-K  (B.3) 

Qi 

This  can  be  checked  by  writing  and  Qi  in  terms  of  their  eigendecompositions 
and  noting  that  (1  +  tA*)^  >  (1  +  tKY  because  1  +  t A*  >  1. 
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As  a  preliminary  step,  we  upper  bound  a  quantity  T^,  defined  by 


Tk  —  AjE 


(B.4) 


In  doing  so,  we  first  need  to  evaluate  the  corresponding  derivatives  of  QkS.  For 
k  =  2,  the  derivative  of  Q2  is  computed  via 


dQ2  _  ,2  *  d  (HpqHqu) 


p  f)Y* 

p-,q,u 


(«)  * 


E  (s"H'Y)  (H«s),  , 


(B.5) 


where  =  follows  from  (B.2).  Given  (B.5),  it  is  easily  generalized  that 


‘  o  l=l 


(B,6) 


Now,  substituting  (B.6)  into  (B.4),  the  upper  bound  for  is  found  as  follows 


Ea.E  ^(s"H'Y)  (H-+'-‘s). 


(a)  ^y-2(/i;+l)  ^  r  2 

|(i=''H'Y)yH‘+>-'s),[ 

1=1  i,j  f 

kp{k+l)  JL 


(B.7) 


where  <  follows  from  the  inequality 


<A: 


(B.8) 
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The  summand  in  (B.IO)  is  upper  bounded  as  follows 


E  [(s-^H'YY^H's)  (s-^H 


fe+l— j  jfeH-1— 


1] 


(a) 


<  E  r||sf  ||H||2(^+^)||D||||YY^ 


(b) 

<  E 


(c) 

<  S^DmU 


\ 


E 


/YY 


H\  2 


\  n 


=  S^DmU 


\ 


E 


i=l 


id) 

<  S^DynDraU^m 


(B.9) 


(a) 

where  <  follows  from  the  Cauchy-Schwartz  inequality  for  matrix  norms,  he.,  ||  AB||  < 

(b) 

||A||||B||.  The  inequality  <  follows  from  the  boundedness  of  the  signal  replica  vector 

and  of  the  matrices  H  and  D,  he.,  I|s|l  l|D||  <  Dm  and  ||H||  <  1.  The  inequality 

(c) 

<  follows  from  the  Cauchy-Schwartz  inequality  for  the  expectation  operator.  Finally, 
{d) 

<  follows  from  the  boundedness  of  the  eigenvalues  of  the  SCM  R. 


The  upper  bound  of  is  hnally  obtained  by  substituting  (B.9)  into  (B.IO)  such 


that 


m 


n 


(B.IO) 


Back  to  the  variance  problem,  it  can  be  shown  that  /(Y)  being  P  or 


such  that 


dYi 


var 


(/(Y))<2  5^A,E 


df{Y) 


<9Y 


^  d.f(Y)  ^ 
85  ’  dY,* 


(B.ll) 


In  the  following,  the  variance  of  each  of  the  estimators  and  their  derivatives  is 
upper  bounded.  The  derivation  for  /(Y)  =  Pa  is  given  in  more  details.  Other 
derivations  are  similar  and  thus  just  outlined. 
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Upper  Bound  of  the  Variance  of  f(Y)  =  Pa 


A  derivative  of  Pa  with  respect  to  Y*j  is  in  terms  of  Qk  expressed  as 


dPa 

1 

A— 

dQi 

dQ2 

dY* 

~tQi 

Lv  ) 

dY* 

dY* 

Using  (B.8),  the  squared  magnitude  of  (B.12)  is  upper  bounded  as 


dPa 

2 

2 

< 

[(2^ -A 

dQi 

2 

+ 

dQ2 

1 - 

dY* 

-pQi 

[1  ) 

dY* 

0 

dY* 

0 

J 

(B^12) 


(B,13) 


The  coefficient  of  the  first  term  in  the  above  expression  is  using  (B.8)  and  (B.3) 
upper  bounded  by  lOt^.  Further,  using  (A.l)  to  lower  bound  the  quadratic  form  Qi 
yields 


dPa 

^  2(1  +  tDm)^ 

10/2 

dQi 

2 

+ 

dQ2 

2" 

dY* 

0 

~  Prrd 

X  W  h 

dY* 

0 

dY* 

0 

(B.14) 


Substituting  (B.14)  into  (B.ll)  yields 


var  (j^a 

<  28{1 +  tD^yD,aDmSi^ 

nm* 

(b)  K 

—  j7^5/2  ’ 


(B.15) 


(a)  (a) 

where  <  follows  from  (B.IO)  and  <  from  the  assumption  that  m  and  n  are  of  the 
same  order  and  Sm  =  0(m^/^). 


Upper  Bound  of  the  Variance  of  f(Y) 


Similarly  as  in  the  previous  part,  the  derivative  of  Pb  with  respect  to  Y*j  is  given  by 


dPb  _  1  dQi 

dY,*  tQl  dS  ■ 


(B.16) 
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Substituting  (B.16)  into  (B.ll)  yields 


Exploiting  the  upper  bound  for  Ti  (B.IO)  and  the  lower  bound  for  Qi  (A.l)  leads 
to  the  upper  bound  for  the  variance  of  Pb 

var  (Pb)  <  2t~^{l  +  tDraYDmDmSm^^-  (B.18) 

V  /  nm^ 

Thus,  under  the  assnmption  that  m  and  n  are  of  the  same  order  and  Sm  = 
var  fA)  <  Km~^,  for  some  positive  hnite  constant  K. 


Upper  Bound  of  the  Variance  of  /(Y)  =  ^ 


A  derivative  of  ^  with  respect  to  Y*j  is  in  terms  of  QkS  expressed  as 


dP' 

2 

\{A 

1  dQi 

0^2  dQ2 

,  dQs 

dY* 

~  tQl 

Lv  Q\ 

QiJ 

'  dY* 

Qi  dY,* 

* 

1 _ 

(B.19) 


The  squared  magnitude  of  the  above  expression  is  using  (B.8)  and  (B.3)  upper 
bounded  as 


dP' 


dY 


< 


12 

tWi 


26r 


dQi 


dY* 


+  4r 


dQ2 


dY* 


+ 


dQ^ 


BY, 


(B.20) 


Snbstitnting  (B.20)  into  (B.ll)  and  using  the  npper  bonnds  for  (B.IO)  and 
lower  bonnd  for  Qi  (A.l),  the  variance  of  P^  is  upper  bounded  as 

var  (Pb)  <  612t\l  +  (B.21) 

i.e.,  if  Sm  =  0{^/rri)  and  m  and  n  are  of  the  same  order,  var  <  Km~i,  where 
K  is  some  positive,  hnite  constant. 
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Upper  Bound  of  the  Variance  of  /(Y)  =  ^ 

A  derivative  of  ^  with  respect  to  is  in  terms  of  Qfc’s  expressed  as 


<9Pfe  _  \d^  _  ^Q2  dQi' 

dY*^  ~  Qj  [dY,*  Q,dY,*_  • 


(B.22) 


The  squared  magnitude  of  the  above  expression  is  using  (B.8)  and  (B.3)  upper 


bounded  as 


dP' 


dY* 


dQ2 


dY* 


+  4P 


dQi 


dY, 


(B.23) 


Substituting  (B.20)  into  (B.ll)  and  using  the  upper  bounds  for  (B.IO)  and 
lower  bound  for  Qi  (A.l)  lead  to 


var 


<  20t^(l  +  WmYDmDmStn 


y/m 

nm^ 


(B.24) 


Therefore,  under  the  assumptions  that  m  and  n  are  of  the  same  order  and  Sm  = 
0{y/m),  var  (Pb)  <  Km~Y  where  K  is  some  positive,  hnite  constant. 
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Appendix  C 


Optimization  Loss 


Let  f{x)  =  a{x  —  xY  and  g{x)  =  h{x  —  x)  +  c  he  respectively  quadratic  and  linear 
functions.  The  function  f{x)  reaches  minimum  ai  x  =  x.  On  the  other  hand,  the 
minimizer  for  the  sum  h{x)  =  f{x)  +  g{x)  is  after  setting  the  hrst  derivative  of  h{x) 
to  zero  given  by 


X 


* 


(C.l) 


Therefore,  the  minimum  of  h{x)  is 


h{x*) 


(C.2) 


The  error  made  by  setting  the  minimizer  of  h{x)  to  be  x  instead  of  x*  is 


h{x)  —  h{x*) 


4a 


(C.3) 


This  result  is  used  to  approximate  the  MSE  loss  made  by  optimizing  the  squared 
bias  instead  of  the  sum  of  the  squared  bias  and  variance.  Therein,  the  approximation 
of  the  squared  bias  is  f{x),  while  the  approximation  of  the  variance  is  g{x). 
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Nomenclature 


m  number  of  coefficients  (dimensionality  of  the  observation  space) 
qfc  the  eigenvector  of  the  ensemble  correlation  matrix  corresponding  to 

Afc  the  k  —  th  largest  eigenvalue  of  the  sample  correlation  matrix 

Qfc  the  eigenvector  of  the  sample  correlation  matrix  corresponding  to  Xk 
r  cross-correlation  vector 

f  estimate  of  the  cross-correlation  vector 

empirical  Eigenvalue  Distribution  Function  corresponding  to 
empirical  Eigenvalue  Density  Function  corresponding  to 
fiA{x)  limiting  Eigenvalue  Density  Function 

Sa^{z)  empirical  Eigenvalue  Stieltjes  Transform  corresponding  to  A^ 
*5a(2:)  limiting  Eigenvalue  Stieltjes  Transform 

FAm{z)  empirical  Eigenvector  Stieltjes  Transform  corresponding  to  A^ 
n  number  of  observations  (snapshots,  samples) 

Fa{z)  limiting  Eigenvector  Stieltjes  Transform 
Mfc  empirical  k-th  moment  of  a  random  matrix 
Mfc  limiting  k-th  moment  of  a  random  matrix 
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Mk  expectation  of  the  k-ih.  empirical  moment  of  a  random  matrix 

expectation  of  the  k-ih  empirical  moment  of  a  matrix  in  the  limit  when  its 
order  grows  large 

A  forgetting  factor 

Aopt  optimal  forgetting  factor 

w  vector  adaptive  processor  weights 

d  separation  between  sensors  in  a  line  array 

signal  replica  vector 

N  number  of  sensors 

6  elevation  angle 

P  power  estimate 

P  true  power 

MSE((5)  estimation  MSE  of  power  estimator  for  diagonal  loading  6 
bias^((5)  squared  bias  of  power  estimator  for  diagonal  loahng  <5 
var((5)  variance  of  power  estimator  for  diagonal  loahng  S 
6  diagonal  loading 

(5opt  diagonal  loading  which  minimizes  MSE((5) 

5opt  diagonal  loading  which  minimizes  bias^(h) 
v{n)  observation  noise  at  discrete  time  n 

c  ratio  between  the  number  of  coefficients  and  number  of  observations 
d{n)  channel  output  at  time  n 
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a‘1  variance  of  the  observation  noise 
variance  of  the  process  noise 
e  channel  estimation  error 

^  signal  prediction  error 

Lff  feedforward  hlter  length 

Lfb  feedback  hlter  length 

Xsoft  soft  decision  estimate 

‘^MMSE  signal  prediction  MSE  of  the  MMSE  processor 

(Tls  signal  prediction  MSE  of  the  LS  processor 

u(n)  observation  vector  received  at  discrete  time  n 

R  ensemble  correlation  matrix 

R  sample  correlation  matrix 

R^  diagonally  loaded  sample  correlation  matrix 

Afc  the  /c-th  largest  eigenvalne  of  the  ensemble  correlation  matrix 
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